<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://aalok05.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://aalok05.github.io/" rel="alternate" type="text/html" /><updated>2026-03-12T10:32:58+00:00</updated><id>https://aalok05.github.io/feed.xml</id><title type="html">Aalok’s Blog</title><subtitle>My Blog</subtitle><author><name>Aalok Singh</name></author><entry><title type="html">Building a session history manager for LLM agents</title><link href="https://aalok05.github.io/ai/2026/03/12/agentic-session-management.html" rel="alternate" type="text/html" title="Building a session history manager for LLM agents" /><published>2026-03-12T00:00:00+00:00</published><updated>2026-03-12T00:00:00+00:00</updated><id>https://aalok05.github.io/ai/2026/03/12/agentic-session-management</id><content type="html" xml:base="https://aalok05.github.io/ai/2026/03/12/agentic-session-management.html"><![CDATA[<p>How to build a production-grade session history manager for LLM agents</p>

<hr />

<h2 id="where-this-comes-from">Where This Comes From</h2>

<p>I’ve been building an agentic solution that involves multiple sub-agents orchestrated together - think a main planning agent that delegates to specialised sub-agents for RAG retrieval, tool execution, and code generation. Each sub-agent adds its own system prompt, tool schemas, retrieved documents, and generated code blocks to the conversation. A single user request can fan out into dozens of internal messages before a final answer surfaces.</p>

<p>It worked beautifully in demos. Then real users started having <em>actual</em> conversations.</p>

<p>Within 15–20 turns, the context window was full. Users started seeing this in production:</p>

<blockquote>
  <p><code class="language-plaintext highlighter-rouge">context limit exceeded</code></p>
</blockquote>

<p>It was because every tool call, every chunk of RAG context, every block of generated code, and every sub-agent handoff was silently piling up in the session history.</p>

<p>I took a step back and built a dedicated context management engine to solve this for good. This post captures the key learnings, common pitfalls, and practical patterns I discovered.</p>

<hr />

<h2 id="the-problem-and-why-400k-context-is-misleading">The Problem (and Why “400K Context” Is Misleading)</h2>

<p>Let’s take the model we’re actually using: <strong>GPT-5.3-codex</strong>. It advertises a <strong>400K total context length</strong>. Sounds massive but here’s the breakdown:</p>

<table>
  <thead>
    <tr>
      <th>Spec</th>
      <th>Tokens</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Input</strong> context limit</td>
      <td>272,000</td>
    </tr>
    <tr>
      <td><strong>Output</strong> (max completion)</td>
      <td>128,000</td>
    </tr>
    <tr>
      <td><strong>Total</strong> (input + output)</td>
      <td>400,000</td>
    </tr>
  </tbody>
</table>

<p>That 400K headline number is <strong>not</strong> what you get to play with. The output budget is reserved for the model’s response. <strong>Your session history, system prompt, tool schemas, RAG context - all of it must fit inside the 272K <em>input</em> limit.</strong> That’s the number that matters for session management.</p>

<p>And 272K still sounds like a lot - until you’re running a multi-agent pipeline. Here’s what a typical turn looks like under the hood:</p>

<table>
  <thead>
    <tr>
      <th>Message</th>
      <th>Role</th>
      <th>Approx. tokens</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>User asks a question</td>
      <td><code class="language-plaintext highlighter-rouge">user</code></td>
      <td>~50</td>
    </tr>
    <tr>
      <td>Planning agent reasons about which sub-agent to call</td>
      <td><code class="language-plaintext highlighter-rouge">assistant</code></td>
      <td>~200</td>
    </tr>
    <tr>
      <td>RAG sub-agent retrieves 3 document chunks</td>
      <td><code class="language-plaintext highlighter-rouge">tool</code></td>
      <td>~2,000</td>
    </tr>
    <tr>
      <td>Code-gen sub-agent produces a solution</td>
      <td><code class="language-plaintext highlighter-rouge">assistant</code></td>
      <td>~1,500</td>
    </tr>
    <tr>
      <td>Tool call to execute/validate the code</td>
      <td><code class="language-plaintext highlighter-rouge">tool</code></td>
      <td>~500</td>
    </tr>
    <tr>
      <td>Final synthesised answer</td>
      <td><code class="language-plaintext highlighter-rouge">assistant</code></td>
      <td>~300</td>
    </tr>
  </tbody>
</table>

<p>That’s <strong>~4,500 tokens for a single turn</strong>. Multiply by 40-50 turns in an extended working session and you’ve blown past 200K tokens without the user typing more than a few sentences. The context doesn’t grow linearly with conversation length; it <strong>balloons</strong> because of the compound effect of sub-agents, RAG payloads, and generated code.</p>

<hr />

<h2 id="pitfall-1-ignoring-the-hidden-token-consumers">Pitfall #1: Ignoring the Hidden Token Consumers</h2>

<p>Most developers look at the spec sheet - “400K context!” - and assume they have that much room for chat. <strong>You don’t.</strong> The input limit is 272K, and even that is eroded by things you never see in the chat:</p>

<table>
  <thead>
    <tr>
      <th>Hidden consumer</th>
      <th>Typical cost</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>System prompt</td>
      <td>1000 - 1500 tokens</td>
    </tr>
    <tr>
      <td>Tool/function schemas (multiple agents)</td>
      <td>1,000 - 5,000 tokens</td>
    </tr>
    <tr>
      <td>Reserved output tokens</td>
      <td>up to 128,000 tokens</td>
    </tr>
  </tbody>
</table>

<p>With GPT-5.3-codex, if you reserve even a modest 16K for output and burn 3K on tool schemas across your sub-agents, <strong>your real budget is ~252K</strong>, not 272K.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># The ACTUAL budget formula
</span><span class="n">input_context_limit</span> <span class="o">=</span> <span class="mi">272_000</span>           
<span class="n">system_tokens</span>       <span class="o">=</span> <span class="n">count_tokens</span><span class="p">(</span><span class="n">system_prompt</span><span class="p">)</span>
<span class="n">reserved</span>            <span class="o">=</span> <span class="n">max_output_tokens</span> <span class="o">+</span> <span class="n">tool_schema_overhead</span>
<span class="n">remaining</span>           <span class="o">=</span> <span class="n">input_context_limit</span> <span class="o">-</span> <span class="n">system_tokens</span> <span class="o">-</span> <span class="n">reserved</span>
<span class="n">budget</span>              <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">remaining</span> <span class="o">*</span> <span class="n">safety_fraction</span><span class="p">)</span>  <span class="c1"># 0.80 recommended
</span></code></pre></div></div>

<p><strong>Lesson:</strong> Always compute your budget against the <strong>input</strong> context limit, not the total. And never assume the full input window is yours either - subtract system prompts, tool schemas, and output reservations first.</p>

<hr />

<h2 id="pitfall-2-trimming-at-the-wrong-boundary">Pitfall #2: Trimming at the Wrong Boundary</h2>

<p>The naive approach is to just drop the oldest messages until you’re under budget. But chat histories aren’t a flat list of independent messages - they contain <strong>logical groups</strong> that must stay together:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>user: "What's the weather in Paris?"
assistant: [tool_call: get_weather("Paris")]    ← function call
tool: "22°C and sunny"                         ← function result
assistant: "It's 22°C and sunny in Paris!"     ← final reply
</code></pre></div></div>

<p>If you trim between the tool call and the tool result, the model sees an orphaned function call with no response - and it hallucinates or errors out.</p>

<p><strong>The fix: align your trim boundary to the next <code class="language-plaintext highlighter-rouge">role == "user"</code> message.</strong> This guarantees you never split a tool-call sequence:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Walk forward to the next "user" message
</span><span class="n">aligned</span> <span class="o">=</span> <span class="bp">None</span>
<span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">keep_from</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">items</span><span class="p">)):</span>
    <span class="k">if</span> <span class="n">items</span><span class="p">[</span><span class="n">j</span><span class="p">].</span><span class="n">get</span><span class="p">(</span><span class="s">"role"</span><span class="p">)</span> <span class="o">==</span> <span class="s">"user"</span><span class="p">:</span>
        <span class="n">aligned</span> <span class="o">=</span> <span class="n">j</span>
        <span class="k">break</span>

<span class="k">if</span> <span class="n">aligned</span> <span class="ow">is</span> <span class="bp">None</span><span class="p">:</span>
    <span class="c1"># No safe boundary exists - nuke the session
</span>    <span class="k">await</span> <span class="n">session</span><span class="p">.</span><span class="n">clear_session</span><span class="p">()</span>
    <span class="k">return</span>

<span class="n">keep_from</span> <span class="o">=</span> <span class="n">aligned</span>
</code></pre></div></div>

<p><strong>Lesson:</strong> Never trim mid-turn. Always snap to a role boundary.</p>

<hr />

<h2 id="pitfall-3-losing-context-silently">Pitfall #3: Losing Context Silently</h2>

<p>Trimming old messages solves the token problem - but destroys context. If the user mentioned their name, a project requirement, or a design decision 40 messages ago, that information is just <em>gone</em>.</p>

<p>The fix is to <strong>summarize before you trim</strong>:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">conversation_text</span> <span class="o">=</span> <span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">.</span><span class="n">join</span><span class="p">(</span>
    <span class="sa">f</span><span class="s">"</span><span class="si">{</span><span class="n">item</span><span class="p">[</span><span class="s">'role'</span><span class="p">]</span><span class="si">}</span><span class="s">: </span><span class="si">{</span><span class="n">item</span><span class="p">[</span><span class="s">'content'</span><span class="p">]</span><span class="si">}</span><span class="s">"</span> <span class="k">for</span> <span class="n">item</span> <span class="ow">in</span> <span class="n">trimmed_items</span>
<span class="p">)</span>

<span class="n">response</span> <span class="o">=</span> <span class="k">await</span> <span class="n">client</span><span class="p">.</span><span class="n">chat</span><span class="p">.</span><span class="n">completions</span><span class="p">.</span><span class="n">create</span><span class="p">(</span>
    <span class="n">model</span><span class="o">=</span><span class="n">summary_model</span><span class="p">,</span>
    <span class="n">messages</span><span class="o">=</span><span class="p">[{</span>
        <span class="s">"role"</span><span class="p">:</span> <span class="s">"user"</span><span class="p">,</span>
        <span class="s">"content"</span><span class="p">:</span> <span class="p">(</span>
            <span class="s">"Summarise this conversation excerpt in 2-4 sentences, "</span>
            <span class="s">"preserving all key facts, decisions, and information "</span>
            <span class="s">"the user may refer back to.</span><span class="se">\n\n</span><span class="s">"</span>
            <span class="sa">f</span><span class="s">"</span><span class="si">{</span><span class="n">conversation_text</span><span class="si">}</span><span class="s">"</span>
        <span class="p">),</span>
    <span class="p">}],</span>
    <span class="n">max_tokens</span><span class="o">=</span><span class="mi">300</span><span class="p">,</span>
    <span class="n">temperature</span><span class="o">=</span><span class="mf">0.3</span><span class="p">,</span>
<span class="p">)</span>

<span class="n">summary</span> <span class="o">=</span> <span class="n">response</span><span class="p">.</span><span class="n">choices</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">message</span><span class="p">.</span><span class="n">content</span><span class="p">.</span><span class="n">strip</span><span class="p">()</span>
</code></pre></div></div>

<p>Then inject the summary back into the session as the very first message:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">await</span> <span class="n">session</span><span class="p">.</span><span class="n">clear_session</span><span class="p">()</span>

<span class="n">new_items</span> <span class="o">=</span> <span class="p">[{</span>
    <span class="s">"role"</span><span class="p">:</span> <span class="s">"user"</span><span class="p">,</span>
    <span class="s">"content"</span><span class="p">:</span> <span class="sa">f</span><span class="s">"[Previous conversation summary]: </span><span class="si">{</span><span class="n">summary</span><span class="si">}</span><span class="s">"</span><span class="p">,</span>
<span class="p">}]</span>
<span class="n">new_items</span><span class="p">.</span><span class="n">extend</span><span class="p">(</span><span class="n">kept_items</span><span class="p">)</span>

<span class="k">await</span> <span class="n">session</span><span class="p">.</span><span class="n">add_items</span><span class="p">(</span><span class="n">new_items</span><span class="p">)</span>
</code></pre></div></div>

<p>Your system prompt should explicitly tell the model to trust these summaries:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>When you see a '[Previous conversation summary]' message,
treat it as a reliable recap of the earlier conversation.
Use it to maintain continuity.
</code></pre></div></div>

<p><strong>Lesson:</strong> Trim the tokens, but compress the knowledge. A 3-sentence summary is worth 10,000 trimmed tokens.</p>

<hr />

<h2 id="pitfall-4-letting-summarization-failures-kill-the-session">Pitfall #4: Letting Summarization Failures Kill the Session</h2>

<p>Summarization uses an LLM call, and LLM calls can fail - network issues, rate limits, content filters. If your summarization crashes and you haven’t trimmed, the <em>next</em> agent call will hit the context limit anyway.</p>

<p><strong>Always trim first, summarize second, and treat summarization as best-effort:</strong></p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">summary_text</span> <span class="o">=</span> <span class="bp">None</span>
<span class="k">try</span><span class="p">:</span>
    <span class="n">summary_text</span> <span class="o">=</span> <span class="k">await</span> <span class="n">summarize</span><span class="p">(</span><span class="n">trimmed_items</span><span class="p">)</span>
<span class="k">except</span> <span class="nb">Exception</span><span class="p">:</span>
    <span class="n">logger</span><span class="p">.</span><span class="n">exception</span><span class="p">(</span><span class="s">"Summarization failed - trimming without summary"</span><span class="p">)</span>

<span class="c1"># Proceed with or without summary
</span><span class="k">await</span> <span class="n">rewrite_session</span><span class="p">(</span><span class="n">summary_text</span><span class="p">,</span> <span class="n">kept_items</span><span class="p">)</span>
</code></pre></div></div>

<p><strong>Lesson:</strong> Graceful degradation &gt; fragile correctness.</p>

<hr />

<h2 id="good-practice-make-the-budget-visible">Good Practice: Make the Budget Visible</h2>

<p>Once I added a React-based session visualizer connected to the backend via WebSocket, debugging became 10x easier. I could literally <em>watch</em> the token budget shrink in real time:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># In your context manager, emit events via a callback:
</span><span class="k">if</span> <span class="n">event_callback</span><span class="p">:</span>
    <span class="k">await</span> <span class="n">event_callback</span><span class="p">({</span>
        <span class="s">"type"</span><span class="p">:</span> <span class="s">"budget_info"</span><span class="p">,</span>
        <span class="s">"context_window"</span><span class="p">:</span> <span class="n">context_window</span><span class="p">,</span>
        <span class="s">"history_tokens"</span><span class="p">:</span> <span class="n">history_tokens</span><span class="p">,</span>
        <span class="s">"budget"</span><span class="p">:</span> <span class="n">budget</span><span class="p">,</span>
    <span class="p">})</span>
</code></pre></div></div>

<p>I’d strongly recommend building a lightweight dashboard when developing agents. Watching the numbers change as you chat makes budget problems obvious <em>before</em> they become runtime errors.</p>

<hr />

<h2 id="good-practice-apply-a-safety-fraction">Good Practice: Apply a Safety Fraction</h2>

<p>Even with precise token counting, there’s always a margin of error - tiktoken counts don’t perfectly match the model’s internal tokenizer, and API overhead adds a few tokens per message.</p>

<p>A simple <strong>0.80× safety fraction</strong> on the remaining budget gives you a comfortable 10% buffer:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">budget</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">remaining</span> <span class="o">*</span> <span class="mf">0.80</span><span class="p">)</span>
</code></pre></div></div>

<p>This single line has prevented more crashes than any other piece of code in the project.</p>

<hr />

<h2 id="the-complete-algorithm">The Complete Algorithm</h2>

<p>Here’s the full strategy distilled into 9 steps:</p>

<ol>
  <li><strong>Count tokens</strong> for system prompt + full session history</li>
  <li><strong>Reserve</strong> output tokens + tool-schema overhead</li>
  <li><strong>Apply safety fraction</strong> (0.80) to the remaining budget</li>
  <li><strong>Keep the newest messages</strong> that fit within budget</li>
  <li><strong>Align trim boundary</strong> forward to the next <code class="language-plaintext highlighter-rouge">role == "user"</code> message</li>
  <li><strong>Summarize</strong> the trimmed messages using an LLM (best-effort)</li>
  <li><strong>Inject the summary</strong> as the first message in the rewritten session</li>
  <li><strong>Clear session entirely</strong> if budget ≤ 0 or no safe boundary exists</li>
  <li><strong>Continue trimming</strong> even if summarization fails</li>
</ol>

<p>Run this function <strong>before every agent turn</strong>, and your agent will handle conversations of unlimited length without ever hitting the context ceiling.</p>

<hr />

<h2 id="tldr">TL;DR</h2>

<table>
  <thead>
    <tr>
      <th>Pitfall</th>
      <th>Fix</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Assuming the full context window is available</td>
      <td>Subtract system prompt, tool schemas, and output reservation</td>
    </tr>
    <tr>
      <td>Trimming mid-turn (splitting tool calls)</td>
      <td>Snap trim boundary to <code class="language-plaintext highlighter-rouge">role == "user"</code></td>
    </tr>
    <tr>
      <td>Losing important context when trimming</td>
      <td>Summarize trimmed messages before discarding</td>
    </tr>
    <tr>
      <td>Summarization failure crashing the session</td>
      <td>Treat summarization as best-effort; always trim</td>
    </tr>
    <tr>
      <td>Debugging budget issues blindly</td>
      <td>Build a real-time visualizer dashboard</td>
    </tr>
  </tbody>
</table>

<hr />

<p>If you’re building with the OpenAI Agents SDK (or any LLM framework), these patterns will save you from the most common - and most frustrating - production failures.</p>

<p>The full POC code (Python backend + React visualizer) is available on my <a href="https://github.com/aalok05/AgentSessionManager">GitHub</a>.</p>

<hr />]]></content><author><name>Aalok Singh</name></author><category term="AI" /><summary type="html"><![CDATA[How to build a production-grade session history manager for LLM agents]]></summary></entry><entry><title type="html">Building an Agentic AI Chatbot that Speaks DSL</title><link href="https://aalok05.github.io/ai/2025/10/03/agentic-dsl-solution.html" rel="alternate" type="text/html" title="Building an Agentic AI Chatbot that Speaks DSL" /><published>2025-10-03T00:00:00+00:00</published><updated>2025-10-03T00:00:00+00:00</updated><id>https://aalok05.github.io/ai/2025/10/03/agentic-dsl-solution</id><content type="html" xml:base="https://aalok05.github.io/ai/2025/10/03/agentic-dsl-solution.html"><![CDATA[<p>Over the last few months, I’ve been deep in the trenches of building agentic AI systems using the  OpenAI Agents SDK. One of the most exciting projects I’ve worked on was creating a chatbot that takes a user’s natural language prompt and outputs  code in a Domain-Specific Language (DSL). Along the way, we combined  retrieval-augmented generation (RAG),  semantic search with Azure AI Search, and a team of specialized agents working under a single orchestrator.</p>

<p>This post shares what I learned from the journey - both the technical architecture and the design decisions that made the system robust.</p>

<hr />

<h3 id="what-is-a-dsl">What is a DSL?</h3>

<p>A  Domain-Specific Language (DSL)  is a mini programming language purpose-built for a narrow domain. Unlike general-purpose languages like Python or Java, which can implement anything from system software to web apps, DSLs are laser-focused.</p>

<p>Think of SQL for databases, regex for pattern matching, or Terraform’s HCL for infrastructure automation. In our case, the DSL defined how certain domain rules and workflows needed to be translated into executable logic.</p>

<p>The key challenge:  users don’t like writing DSL directly. They want to describe rules in English and let the AI generate clean, valid DSL code.</p>

<p>The DSL I am working with is a proprietary language developed for a specific purpose within this organization. You can be sure no LLM was trained on this language as nothing about it is available on the open web. This is where a robust RAG system becomes important, as we will see.</p>

<hr />

<h3 id="the-core-problem">The Core Problem</h3>

<p>We needed a chatbot that:</p>

<ol>
  <li>Accepts user prompts in natural language.</li>
  <li>Understands the intent and domain context.</li>
  <li>Generates  valid, executable DSL code, not just free-form text.</li>
  <li>Ensures correctness by consulting external knowledge sources.</li>
  <li>Decomposes complex user tasks into smaller, domain-specific steps.</li>
</ol>

<hr />

<h3 id="garbage-in-garbage-out-data-preparation-was-critical">Garbage In, Garbage Out: Data Preparation Was Critical</h3>

<p>Before any agents could work their magic, we had to solve a foundational problem:  the quality of our DSL documentation directly determined the quality of our outputs.</p>

<p>The original DSL documentation was messy - inconsistent formatting, outdated examples, duplicate sections, and mixed terminology. We learned the hard way that  feeding raw, unsanitized data into a vector database produces unreliable retrieval results.</p>

<p>Here’s how we prepared the data:</p>

<ul>
  <li>Standardization:  We normalized all DSL syntax examples, ensured consistent naming conventions, and removed deprecated constructs.</li>
  <li>Chunking Strategy:  Large documentation pages were split into logical chunks that represented complete concepts, along with some overlap.</li>
  <li>Metadata Enrichment:  Each chunk was tagged with metadata like DSL version, concept category, and complexity level to enable filtered retrieval.</li>
  <li>Deduplication:  We removed redundant examples and consolidated overlapping explanations.</li>
</ul>

<p>Once sanitized, we generated embeddings using OpenAI’s text-embedding-3-large model and ingested them into  Azure AI Search  as our vector database. The lesson:  spend 50% of your effort on data quality - it multiplies your retrieval accuracy exponentially.</p>

<hr />

<h3 id="semantic-reranking-the-secret-weapon">Semantic Reranking: The Secret Weapon</h3>

<p>Azure AI Search provided semantic search out of the box, but we went a step further by implementing  semantic reranking.</p>

<p>Here’s how it works: when a user query comes in, the system performs two stages of retrieval:</p>

<ol>
  <li>Initial Retrieval (Vector Search):  The query embedding is compared against all document embeddings in the index using cosine similarity. This returns the top 50 candidate chunks.</li>
  <li>Semantic Reranking:  Instead of just returning the top results from vector similarity, Azure AI Search’s semantic ranker uses a  cross-encoder model  that deeply understands the relationship between the query and each candidate document. It re-ranks the top 50 results based on semantic relevance, not just vector proximity.</li>
</ol>

<p>Why does this matter? Vector similarity can sometimes retrieve chunks that  <em>mention</em>  similar terms but don’t actually answer the user’s intent. Semantic reranking ensures that the final top 5-10 results are  contextually the most relevant  to what the user is asking.</p>

<p>For example, if a user asks “How do I handle manager approvals in transaction rules?”, vector search might return chunks about “manager roles” or “transaction limits” separately. But semantic reranking understands the  <em>combined intent</em>  and prioritizes chunks that show the  intersection  of both concepts.</p>

<p>This dramatically reduced hallucinations and improved DSL generation accuracy.</p>

<hr />

<h3 id="why-agents-instead-of-just-one-model">Why Agents Instead of Just One Model?</h3>

<p>A single LLM prompt could, theoretically, output DSL. But real-world usage taught us that:</p>

<ul>
  <li>Some user prompts required  retrieving specific domain rules  (from internal documentation).</li>
  <li>The DSL had  different subdomains, each requiring its own expert logic.</li>
  <li>Errors in DSL syntax needed specialized correction strategies.</li>
</ul>

<p>This is where the  agentic approach  shines. Instead of one monolithic LLM call, we built a  <a href="https://aalok05.github.io/ai/2025/09/13/building-modular-ai-agents.html">multi-agent system coordinated by an Orchestrator Agent.</a></p>

<hr />

<h3 id="the-architecture">The Architecture</h3>

<p>Here’s how the system came together:</p>

<ul>
  <li>Orchestrator Agent  Uses the OpenAI Agents SDK to route the user’s request. It decides whether the query requires retrieval, DSL transformation, or validation. It is given ample context and instructions as system prompt.</li>
  <li>DSL Experts (Helper Agents)  Each agent is an expert in a  <em>subset</em>  of DSL. For example:  *syntax, rules, validation  *and so on.</li>
  <li>Knowledge Retrieval (RAG + Azure AI Search)  Users often referred to legacy rules, policies, and code snippets. To handle this, I integrated  semantic search in Azure AI Search. The orchestrator first queries the knowledge base for relevant examples with semantic reranking, injects them into context, and passes them to the helper agents.</li>
  <li>Feedback &amp; Correction Loop  If the generated DSL didn’t validate, the Orchestrator Agent would trigger a handoff to the  <em>Validator Agent</em>, which corrected and finalized the output.</li>
</ul>

<p><img src="/assets/images/modular_agents_diag.png" alt="Birds' eye view of plug and play agentic architecture" /></p>

<h3 id="learnings-along-the-way">Learnings Along the Way</h3>

<ul>
  <li>Data quality is non-negotiable.  Garbage in, garbage out isn’t just a saying—it’s reality. Clean, validated, well-chunked documentation made all the difference.</li>
  <li>Semantic reranking beats pure vector search.  The cross-encoder reranking step eliminated false positives and surfaced truly relevant context.</li>
  <li>Agent collaboration beats single-shot prompting.  Letting agents specialize made the system  <em>modular, scalable, and less error-prone.</em></li>
  <li>RAG is not optional in enterprise-grade DSL systems.  When dealing with evolving rules, retrieval ensures correctness over hallucination.</li>
  <li>Pausing and handoff are powerful.  Using the Agents SDK’s ability to pause execution and let another agent take over kept us from “forcing” correctness in a single prompt.</li>
</ul>]]></content><author><name>Aalok Singh</name></author><category term="AI" /><summary type="html"><![CDATA[Over the last few months, I’ve been deep in the trenches of building agentic AI systems using the OpenAI Agents SDK. One of the most exciting projects I’ve worked on was creating a chatbot that takes a user’s natural language prompt and outputs code in a Domain-Specific Language (DSL). Along the way, we combined retrieval-augmented generation (RAG), semantic search with Azure AI Search, and a team of specialized agents working under a single orchestrator.]]></summary></entry><entry><title type="html">Building Modular AI Agents: A Plugin Architecture</title><link href="https://aalok05.github.io/ai/2025/09/13/building-modular-ai-agents.html" rel="alternate" type="text/html" title="Building Modular AI Agents: A Plugin Architecture" /><published>2025-09-13T00:00:00+00:00</published><updated>2025-09-13T00:00:00+00:00</updated><id>https://aalok05.github.io/ai/2025/09/13/building-modular-ai-agents</id><content type="html" xml:base="https://aalok05.github.io/ai/2025/09/13/building-modular-ai-agents.html"><![CDATA[<p>Most AI solutions start simple but quickly become unwieldy monoliths. Whether you’re building chatbots, RAG systems, or AI assistants, you’ve probably experienced this evolution (although this is a problem with non AI projects as well):</p>

<p><strong>Day 1:</strong> “Let’s build a simple AI assistant that can answer questions about our docs”</p>

<p><strong>Day 30:</strong> “Now it needs to search our database, generate reports, and handle customer queries”</p>

<p><strong>Day 90:</strong> “The system is slow, hard to test, and any change breaks something else”</p>

<h3 id="why-this-happens">Why This Happens</h3>

<ul>
  <li>Knowledge Base Coupling:  Everything gets thrown into one giant vector database</li>
  <li>Pipeline Rigidity:  One retrieval system tries to handle all content types</li>
  <li>Prompt Bloat:  A single “universal” prompt grows to handle every edge case</li>
  <li>Integration Mess:  One API endpoint doing everything becomes impossible to maintain</li>
</ul>

<h3 id="the-real-cost">The Real Cost</h3>

<ul>
  <li>Development slowdown:  Teams can’t work independently</li>
  <li>Testing nightmare:  Need to spin up everything to test one feature</li>
  <li>Deployment risk:  Any change affects the entire system</li>
  <li>Scaling inefficiency:  Resource usage driven by your heaviest component</li>
</ul>

<h2 id="the-solution-plugin-architecture">The Solution: Plugin Architecture</h2>

<p>Instead of one massive RAG solution, build modular agents that have unique capabilities:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">abc</span> <span class="kn">import</span> <span class="n">ABC</span><span class="p">,</span> <span class="n">abstractmethod</span>

<span class="k">class</span> <span class="nc">BaseModularAgent</span><span class="p">(</span><span class="n">ABC</span><span class="p">):</span>
    <span class="s">"""Base class for all Agents"""</span>
    
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">agent</span> <span class="o">=</span> <span class="bp">None</span>
    
    <span class="o">@</span><span class="n">abstractmethod</span>
    <span class="k">def</span> <span class="nf">get_metadata</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">dict</span><span class="p">:</span>
        <span class="s">"""Return metadata this agent"""</span>
        <span class="k">return</span> <span class="p">{</span>
            <span class="s">"name"</span><span class="p">:</span> <span class="s">"agent_name"</span><span class="p">,</span>
            <span class="s">"description"</span><span class="p">:</span> <span class="s">"What this agent does"</span>
        <span class="p">}</span>
    
    <span class="o">@</span><span class="n">abstractmethod</span>
    <span class="k">def</span> <span class="nf">get_prompt</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">str</span><span class="p">:</span>
        <span class="s">"""System prompt for the agent"""</span>
        <span class="k">pass</span>
    
    <span class="o">@</span><span class="n">abstractmethod</span>
    <span class="k">async</span> <span class="k">def</span> <span class="nf">initialize</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="s">"""Initialize the agent"""</span>
        <span class="c1"># Create agent with the prompt
</span>        <span class="k">pass</span>

</code></pre></div></div>

<h3 id="core-principles">Core Principles</h3>

<ul>
  <li>Single Responsibility:  Each plugin does one thing well</li>
  <li>Self-Contained:  Plugins manage their own resources and dependencies</li>
  <li>Discoverable:  System automatically finds and loads available plugin agents</li>
  <li>Composable:  Plugins can work together or independently</li>
</ul>

<h2 id="dynamic-discovery-finding-your-plugin-agents">Dynamic Discovery: Finding Your Plugin Agents</h2>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">async</span> <span class="k">def</span> <span class="nf">find_and_initialize_agents</span><span class="p">(</span><span class="n">directory</span><span class="p">:</span> <span class="nb">str</span><span class="p">):</span>
    <span class="s">""" Discover and initialize plugin agents"""</span>
    <span class="n">agents</span> <span class="o">=</span> <span class="p">[]</span>
    
    <span class="k">for</span> <span class="nb">file</span> <span class="ow">in</span> <span class="n">os</span><span class="p">.</span><span class="n">listdir</span><span class="p">(</span><span class="n">directory</span><span class="p">):</span>
        <span class="k">if</span> <span class="nb">file</span><span class="p">.</span><span class="n">endswith</span><span class="p">(</span><span class="s">'_pluginagent.py'</span><span class="p">):</span>
            <span class="c1"># Load the module
</span>            <span class="n">module</span> <span class="o">=</span> <span class="n">importlib</span><span class="p">.</span><span class="n">import_module</span><span class="p">(</span><span class="nb">file</span><span class="p">[:</span><span class="o">-</span><span class="mi">3</span><span class="p">])</span>
            
            <span class="c1"># Find agent classes
</span>            <span class="k">for</span> <span class="n">item</span> <span class="ow">in</span> <span class="nb">dir</span><span class="p">(</span><span class="n">module</span><span class="p">):</span>
                <span class="n">obj</span> <span class="o">=</span> <span class="nb">getattr</span><span class="p">(</span><span class="n">module</span><span class="p">,</span> <span class="n">item</span><span class="p">)</span>
                <span class="k">if</span> <span class="p">(</span><span class="nb">isinstance</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="nb">type</span><span class="p">)</span> <span class="ow">and</span> 
                    <span class="nb">issubclass</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="n">BaseModularAgent</span><span class="p">)</span> <span class="ow">and</span> 
                    <span class="n">obj</span> <span class="o">!=</span> <span class="n">BaseModularAgent</span><span class="p">):</span>
                    
                    <span class="c1"># Create instance and initialize the agent
</span>                    <span class="n">agents</span> <span class="o">=</span> <span class="n">obj</span><span class="p">()</span>
                    <span class="k">await</span> <span class="n">agents</span><span class="p">.</span><span class="n">initialize</span><span class="p">()</span>
                    <span class="n">agents</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">agents</span><span class="p">)</span>
    
    <span class="k">return</span> <span class="n">agents</span>
</code></pre></div></div>

<h2 id="modular-agent-example">Modular Agent Example</h2>

<h3 id="company-policy-agent">Company Policy Agent</h3>

<p>The below example omits the ‘retrieval’ part of RAG for simplicity. In real world this agent would most likely perform semantic search on a vector database which contains embeddings of company policy documents. It would define a function tool responsible for performing this search.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">agents</span> <span class="kn">import</span> <span class="n">Agent</span>

<span class="k">class</span> <span class="nc">CompanyPolicyAgent</span><span class="p">(</span><span class="n">BaseModularAgent</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">get_metadata</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">dict</span><span class="p">:</span>
        <span class="k">return</span> <span class="p">{</span>
            <span class="s">"name"</span><span class="p">:</span> <span class="s">"document_search"</span><span class="p">,</span>
            <span class="s">"description"</span><span class="p">:</span> <span class="s">"Search through company documents and policies"</span>
        <span class="p">}</span>
    
    <span class="k">def</span> <span class="nf">get_prompt</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">str</span><span class="p">:</span>
        <span class="k">return</span> <span class="s">"""You are a document search specialist. Help users find relevant 
        documents, policies, and information from the company knowledge base. 
        Always provide accurate, cited results."""</span>
    
    <span class="k">async</span> <span class="k">def</span> <span class="nf">initialize</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="n">metadata</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">get_metadata</span><span class="p">()</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">agent</span> <span class="o">=</span> <span class="n">Agent</span><span class="p">(</span>
                <span class="n">name</span><span class="o">=</span><span class="n">metadata</span><span class="p">[</span><span class="s">"name"</span><span class="p">],</span>
                <span class="n">instructions</span><span class="o">=</span><span class="bp">self</span><span class="p">.</span><span class="n">get_prompt</span><span class="p">(),</span>
                <span class="n">model</span><span class="o">=</span><span class="s">"o3"</span>
        <span class="p">)</span>

</code></pre></div></div>

<h2 id="putting-it-all-together-with-agents-sdk">Putting It All Together with Agents SDK</h2>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">agents</span> <span class="kn">import</span> <span class="n">Agent</span><span class="p">,</span> <span class="n">Runner</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">importlib</span>

<span class="k">async</span> <span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
    <span class="c1"># Discover and initialize modular agents and use them as tools for orchestrator agent
</span>    <span class="n">agentModules</span> <span class="o">=</span> <span class="p">{}</span>
    
    <span class="k">for</span> <span class="nb">file</span> <span class="ow">in</span> <span class="n">os</span><span class="p">.</span><span class="n">listdir</span><span class="p">(</span><span class="s">'./plugin_agents'</span><span class="p">):</span>
        <span class="k">if</span> <span class="nb">file</span><span class="p">.</span><span class="n">endswith</span><span class="p">(</span><span class="s">'_pluginagent.py'</span><span class="p">):</span>
            <span class="c1"># Load the module
</span>            <span class="n">module</span> <span class="o">=</span> <span class="n">importlib</span><span class="p">.</span><span class="n">import_module</span><span class="p">(</span><span class="nb">file</span><span class="p">[:</span><span class="o">-</span><span class="mi">3</span><span class="p">])</span>
            
            <span class="c1"># Find agent classes
</span>            <span class="k">for</span> <span class="n">item</span> <span class="ow">in</span> <span class="nb">dir</span><span class="p">(</span><span class="n">module</span><span class="p">):</span>
                <span class="n">obj</span> <span class="o">=</span> <span class="nb">getattr</span><span class="p">(</span><span class="n">module</span><span class="p">,</span> <span class="n">item</span><span class="p">)</span>
                <span class="k">if</span> <span class="p">(</span><span class="nb">isinstance</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="nb">type</span><span class="p">)</span> <span class="ow">and</span> 
                    <span class="nb">issubclass</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="n">BaseModularAgent</span><span class="p">)</span> <span class="ow">and</span> 
                    <span class="n">obj</span> <span class="o">!=</span> <span class="n">BaseModularAgent</span><span class="p">):</span>
                    <span class="c1"># Create instance and initialize the agent
</span>                    <span class="n">pluginAgent</span> <span class="o">=</span> <span class="n">obj</span><span class="p">()</span>
                    <span class="k">await</span> <span class="n">pluginAgent</span><span class="p">.</span><span class="n">initialize</span><span class="p">()</span>
                    
                    <span class="n">metadata</span> <span class="o">=</span> <span class="n">pluginAgent</span><span class="p">.</span><span class="n">get_metadata</span><span class="p">()</span>
                    <span class="n">agentModules</span><span class="p">[</span><span class="n">metadata</span><span class="p">[</span><span class="s">"name"</span><span class="p">]]</span> <span class="o">=</span> <span class="n">pluginAgent</span>
                    <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"Loaded agent: </span><span class="si">{</span><span class="n">metadata</span><span class="p">[</span><span class="s">'description'</span><span class="p">]</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
    
    <span class="c1"># Convert each plugin agent into a tool
</span>    <span class="n">tools</span> <span class="o">=</span> <span class="p">[]</span>
    <span class="k">for</span> <span class="n">name</span><span class="p">,</span> <span class="n">agent</span> <span class="ow">in</span> <span class="n">agentModules</span><span class="p">.</span><span class="n">items</span><span class="p">():</span>
        <span class="n">metadata</span> <span class="o">=</span> <span class="n">agent</span><span class="p">.</span><span class="n">get_metadata</span><span class="p">()</span>
        <span class="n">tool</span> <span class="o">=</span> <span class="n">agent</span><span class="p">.</span><span class="n">agent</span><span class="p">.</span><span class="n">as_tool</span><span class="p">(</span>
            <span class="n">name</span><span class="o">=</span><span class="n">metadata</span><span class="p">[</span><span class="s">"name"</span><span class="p">],</span>
            <span class="n">description</span><span class="o">=</span><span class="n">metadata</span><span class="p">[</span><span class="s">"description"</span><span class="p">]</span>
        <span class="p">)</span>
        <span class="n">tools</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">tool</span><span class="p">)</span>
    
    <span class="c1"># Create orchestrator agent with all plugin agents as tools
</span>    <span class="n">orchestrator</span> <span class="o">=</span> <span class="n">Agent</span><span class="p">(</span>
        <span class="n">name</span><span class="o">=</span><span class="s">"AI Assistant"</span><span class="p">,</span>
        <span class="n">instructions</span><span class="o">=</span><span class="s">"""You are an intelligent assistant that can help with various tasks. 
        You have access to specialized agents through tools. Use the appropriate tool 
        based on what the user needs."""</span><span class="p">,</span>
        <span class="n">model</span><span class="o">=</span><span class="s">"o3"</span><span class="p">,</span>
        <span class="n">tools</span><span class="o">=</span><span class="n">tools</span>
    <span class="p">)</span>
    
    <span class="c1"># Run the orchestrator agent with user prompt
</span>    <span class="n">user_input</span> <span class="o">=</span> <span class="s">"Find our company policy on remote work"</span>
    <span class="k">await</span> <span class="n">Runner</span><span class="p">.</span><span class="n">run</span><span class="p">(</span><span class="n">orchestrator</span><span class="p">,</span> <span class="n">user_input</span><span class="p">)</span>

<span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="s">"__main__"</span><span class="p">:</span>
    <span class="n">asyncio</span><span class="p">.</span><span class="n">run</span><span class="p">(</span><span class="n">main</span><span class="p">())</span> 
</code></pre></div></div>

<h2 id="why-this-approach-works">Why This Approach Works</h2>

<p><strong>For Developers:</strong></p>
<ul>
  <li>Build and test individual agents independently</li>
  <li>Add new features without touching existing code</li>
  <li>Debug issues in isolation</li>
</ul>

<p><strong>For Operations:</strong></p>
<ul>
  <li>Deploy updates to specific agents only</li>
  <li>Scale resources based on actual usage patterns</li>
  <li>Monitor and troubleshoot individual components</li>
</ul>

<p><strong>For Business:</strong></p>
<ul>
  <li>Faster feature development with parallel teams</li>
  <li>Reduced risk of system-wide failures</li>
  <li>Easy integration of new AI models and providers</li>
</ul>

<h2 id="getting-started">Getting Started</h2>

<ol>
  <li><strong>Identify your AI capabilities</strong> - What different things does your system need to do?</li>
  <li><strong>Create separate plugins</strong> - Build one plugin agent per capability</li>
  <li><strong>Use the discovery pattern</strong> - Let your system automatically find agents</li>
  <li><strong>Start simple</strong> - Begin with basic implementations and evolve</li>
</ol>

<h2 id="the-bottom-line">The Bottom Line</h2>

<p>Plugin architecture isn’t just about code organization - it’s about building AI solutions that can evolve with your business needs. Instead of rewriting everything when requirements change, you simply add, remove, or update individual capabilities.</p>

<p>Start small, think modular, and watch your AI solution become more maintainable, scalable, and powerful over time.</p>]]></content><author><name>Aalok Singh</name></author><category term="AI" /><summary type="html"><![CDATA[Most AI solutions start simple but quickly become unwieldy monoliths. Whether you’re building chatbots, RAG systems, or AI assistants, you’ve probably experienced this evolution (although this is a problem with non AI projects as well):]]></summary></entry><entry><title type="html">My conversations with LLM - How does TLS work?</title><link href="https://aalok05.github.io/cloud/and/networking/2025/05/30/how-does-tls-work.html" rel="alternate" type="text/html" title="My conversations with LLM - How does TLS work?" /><published>2025-05-30T00:00:00+00:00</published><updated>2025-05-30T00:00:00+00:00</updated><id>https://aalok05.github.io/cloud/and/networking/2025/05/30/how-does-tls-work</id><content type="html" xml:base="https://aalok05.github.io/cloud/and/networking/2025/05/30/how-does-tls-work.html"><![CDATA[<p>Some days you just wake up curious. Today when  <a href="https://www.youtube.com/watch?v=WxxHH_4vKh4">this</a>  video showed up in my feed, I had some questions about TLS protocol.</p>

<p>This took my down a rabbit hole of how Symmetric Key Derivation algorithms work. I’m sharing a succinct and paraphrased version of that conversation with gpt-4o:</p>

<hr />

<h3 id="q-you-are-an-expert-in-tls-protocol-which-keys-are-used-to-encrypt-my-https-traffic">Q: You are an expert in TLS protocol. Which keys are used to encrypt my HTTPS traffic?</h3>

<ul>
  <li>Neither the public key nor the private key (asymmetric keys used to sign and validate certs) directly encrypts the actual traffic.</li>
  <li>They are used during the  handshake  to  establish a symmetric key.</li>
  <li>The  symmetric session key  is then used to encrypt the actual data.</li>
</ul>

<hr />

<h3 id="q-can-you-dive-deeper-into-how-symmetric-session-keys-are-used-to-encrypt-traffic">Q: Can you dive deeper into how symmetric session keys are used to encrypt traffic?</h3>

<p>… Yes,  the actual traffic  <em>is</em>  encrypted, but  not with public/private keys. It’s encrypted using a  symmetric encryption algorithm  (like AES) after the TLS handshake is complete.</p>

<p>Let’s dive deeper into how this works:</p>

<h3 id="1-purpose-of-publicprivate-keys-asymmetric-encryption">1. Purpose of Public/Private Keys (Asymmetric Encryption)</h3>

<ul>
  <li>Used  only during the handshake  phase of a TLS connection.</li>
  <li>Enables:</li>
</ul>

<blockquote>
  <p>Why not use public/private keys for actual data encryption?</p>
</blockquote>

<ul>
  <li>Performance: Asymmetric encryption is computationally expensive and slower.</li>
  <li>Security Best Practices: Symmetric encryption is faster and more suitable for large data volumes like website content.</li>
</ul>

<h3 id="2-actual-traffic-encryption-symmetric-encryption">2. Actual Traffic Encryption (Symmetric Encryption)</h3>

<p>Once the handshake is complete:</p>

<ul>
  <li>A  shared symmetric key  (or a set of keys) is agreed upon.</li>
  <li>From then on,  all HTTP traffic is encrypted with this symmetric key, using algorithms like:</li>
</ul>

<h3 id="-how-is-the-symmetric-key-established">🔑 How Is the Symmetric Key Established?</h3>

<p>It depends on the  key exchange algorithm:</p>

<h3 id="modern-tls-using-ecdhe">Modern TLS (using ECDHE):</h3>

<ul>
  <li>Both sides generate ephemeral key pairs (private + public).</li>
  <li>They exchange  public  parts.</li>
  <li>Each side uses their  private key  + the other’s  public key  to derive the  same shared secret  (via elliptic curve math).</li>
  <li>This shared secret is passed through a  Key Derivation Function (KDF)  to generate:</li>
</ul>

<hr />

<h3 id="q-what-are-the-characteristics-of-symmetric-keys">Q: What are the characteristics of symmetric keys?</h3>

<h3 id="️summary">🛡️Summary</h3>

<ul>
  <li>🔁 Same key is used for both encryption and decryption</li>
  <li>⚡ Very fast and efficient</li>
  <li>🔢 Typically 128–256 bits (longer = more secure)</li>
  <li>🔒 Must be securely exchanged or derived</li>
  <li>🌐 Used in TLS to encrypt actual traffic after handshake</li>
  <li>🔧 Common algorithms include AES and ChaCha20</li>
</ul>

<hr />

<h3 id="q-how-difficult-is-it-to-get-hold-of-session-keys-and-why-is-it-difficult">Q: How difficult is it to get hold of session keys and why is it difficult?</h3>

<h3 id="️summary-1">🛡️Summary</h3>

<ul>
  <li>☠️ Session key compromise is fatal — all traffic can be decrypted</li>
  <li>🛡️ Extremely difficult to obtain the session key</li>
  <li>🧠 Session key is never sent, is ephemeral, exists only in RAM, and is derived via secure ECDHE math</li>
  <li>🚫 Attacker’s best chances are malware, memory scraping, or exploiting bugs (which are rare and detectable)</li>
</ul>

<hr />

<h3 id="q-you-mentioned-earlier-that-both-parties-use-the-same-session-key-however-you-also-say-that-keys-are-never-actually-transmitted-how-is-that-possible">Q: You mentioned earlier that both parties use the same session key. However you also say that keys are never actually transmitted. How is that possible?</h3>

<h3 id="-short-answer">✅ Short Answer:</h3>

<ul>
  <li>Yes,  both client and server use the  <em>same session key(s)</em>.</li>
  <li>But  this session key is  never transmitted over the network.</li>
  <li>Instead, both parties  independently derive the same session key, using secure mathematical operations during the  key exchange phase.</li>
</ul>

<h3 id="-summary-same-key-but-not-shared-explicitly">🔑 Summary: Same Key, But Not Shared Explicitly</h3>

<ul>
  <li>🔑 Symmetric session key is the same on both client and server</li>
  <li>📵 It is never transmitted over the network</li>
  <li>🧮 Each party derives it using ECDHE math on exchanged public keys and their own private key</li>
  <li>🔐 Security benefit: even if someone intercepts all traffic, they can’t compute the key without a private key</li>
</ul>

<hr />

<h3 id="qhow-is-it-possible-that-same-exact-secret-is-generated-when-a-public-key-and-a-different-private-key-which-could-be-anything-goes-into-a-mathematical-function">Q: How is it possible that same exact secret is generated when a public key and a different private key (which could be anything) goes into a mathematical function?</h3>

<p>That’s an excellent and deep question — and it’s the heart of  Elliptic Curve Diffie-Hellman (ECDH)  key exchange.</p>

<p>It  does  seem strange at first:</p>

<blockquote>
  <p>How can  two parties, each using a  different private key, and the  other’s public key, end up computing  the exact same secret?</p>
</blockquote>

<p>Let’s walk through it  mathematically but intuitively.</p>

<hr />

<h3 id="-tldr">🔐 TL;DR:</h3>

<ul>
  <li>The  shared secret  is the result of a  scalar multiplication  of elliptic curve points.</li>
  <li>Due to the  commutative  nature of the math (like a × b = b × a), both parties arrive at the  same result  independently.</li>
  <li>The result is the same because of how  elliptic curve point multiplication  works.</li>
</ul>

<h3 id="-final-insight">🔐 Final Insight</h3>

<blockquote>
  <p>ECDH works because both parties use their own private key + the other’s public key to derive the same point — and the math ensures that the result is identical.</p>
</blockquote>

<p>They don’t need to know each other’s private keys — they just rely on  shared public information  and the hard math problem underneath.</p>

<p><em>— Conversation Over —</em></p>

<p>That last bit about  ECDH  algorithm was fascinating! I hope you got to learn something from this as well.</p>]]></content><author><name>Aalok Singh</name></author><category term="Cloud" /><category term="and" /><category term="Networking" /><summary type="html"><![CDATA[Some days you just wake up curious. Today when this video showed up in my feed, I had some questions about TLS protocol.]]></summary></entry><entry><title type="html">Handling False Positives in Azure Application Gateway WAF with OWASP Rules</title><link href="https://aalok05.github.io/cloud/and/networking/2025/03/07/handling-false-positives-in-waf.html" rel="alternate" type="text/html" title="Handling False Positives in Azure Application Gateway WAF with OWASP Rules" /><published>2025-03-07T00:00:00+00:00</published><updated>2025-03-07T00:00:00+00:00</updated><id>https://aalok05.github.io/cloud/and/networking/2025/03/07/handling-false-positives-in-waf</id><content type="html" xml:base="https://aalok05.github.io/cloud/and/networking/2025/03/07/handling-false-positives-in-waf.html"><![CDATA[<p>Azure Web Application Firewall (WAF) in Application Gateway helps protect web applications from threats by using pre-configured rules, including the ‘Open Web Application Security Project’ (OWASP) Core Rule Set (CRS). However, these rules can sometimes block legitimate traffic, causing false positives.</p>

<p>Recently, one of our clients faced an issue where their WAF was incorrectly flagging legitimate users as malicious, denying them access. This disrupted operations and highlighted the need for fine-tuning WAF settings.</p>

<h2 id="why-false-positives-happen">Why False Positives Happen</h2>

<p>False positives occur when WAF misidentifies a legitimate request as an attack. Common reasons include:</p>

<ol>
  <li>Strict OWASP Rules  – OWASP CRS is aggressive by default, sometimes blocking normal traffic that resembles attack patterns.</li>
  <li>Special Characters in Input  – If a request contains characters like &lt; script &gt; or SQL-like keywords, it may trigger SQL Injection or XSS rules.</li>
  <li>APIs and JSON Payloads  – API requests with structured data (JSON/XML) can be misclassified as malicious.</li>
  <li>Sensitive Query Parameters  – URLs with words resembling attack payloads might get blocked.</li>
  <li>Unique Application Behavior  – Custom workflows may not align with predefined WAF rules.</li>
</ol>

<h2 id="how-anomaly-scoring-works">How Anomaly Scoring Works</h2>

<p>WAF assigns a score to each request based on detected rule violations. If the cumulative anomaly score exceeds a threshold (default: 5), the request is blocked. Lowering the score threshold increases sensitivity, while raising it reduces false positives.</p>

<h2 id="identifying-false-positives">Identifying False Positives</h2>

<ol>
  <li>Check WAF Logs  – Logs in Azure Monitor show details of blocked requests and rule IDs.</li>
  <li>Use Detection Mode  – Running WAF in “Detection” mode helps identify issues without blocking traffic.</li>
  <li>Analyze Blocked Requests  – Reviewing request payloads helps pinpoint the cause of false positives.</li>
</ol>

<h2 id="fixing-false-positives">Fixing False Positives</h2>

<h3 id="1-adjust-owasp-rule-set-sensitivity">1. Adjust OWASP Rule Set Sensitivity</h3>

<p>Azure WAF allows you to choose different OWASP CRS versions (e.g., 3.1, 3.2), each with varying levels of strictness. Some newer versions reduce false positives by improving attack pattern detection. If legitimate requests are frequently being blocked, consider testing a different OWASP rule set version to see if it results in fewer disruptions.</p>

<p>Additionally, adjusting the  anomaly scoring threshold  can help. If the WAF is blocking too many valid requests, increasing the anomaly score threshold can allow more traffic through while still preventing actual threats. Conversely, lowering the threshold can make WAF more aggressive in blocking suspicious activity.</p>

<h3 id="2-create-custom-rules">2. Create Custom Rules</h3>

<p>In some cases, predefined WAF rules may not align with specific application behavior. Creating  custom rules  allows for more flexible traffic control. Here’s how:</p>

<ul>
  <li>Allow trusted IP ranges for internal applications or API requests.</li>
  <li>Bypass specific headers, query parameters, or request bodies that are known to trigger false positives.</li>
  <li>Define rules to exclude requests from evaluation based on user roles or authentication status.</li>
</ul>

<p>Custom rules should be tested in  Detection mode  before enforcing them to ensure they don’t introduce vulnerabilities.</p>

<h3 id="3-tune-request-body-inspection">3. Tune Request Body Inspection</h3>

<p>By default, WAF inspects request bodies to detect malicious payloads, but this can sometimes lead to false positives when processing large JSON or XML payloads. To fine-tune this:</p>

<ul>
  <li>Increase the  request body size limit  if valid requests are getting blocked due to payload size.</li>
  <li>Configure  content type exclusions  to prevent unnecessary inspections for non-critical endpoints.</li>
  <li>Modify  request body inspection thresholds  to ensure only high-risk content gets flagged.</li>
</ul>

<h3 id="4-test-with-detection-mode">4. Test with Detection Mode</h3>

<p>Run WAF in Detection mode before enforcing strict blocking to observe behavior. Detection mode helps analyze the impact of WAF rules without actually blocking traffic</p>

<h2 id="conclusion">Conclusion</h2>

<p>A key takeaway for me in the recent events at one of our clients was - there cannot be a ‘checkbox’ approach to security, where we enable OWASP rules and forget, assuming everything is now secure.</p>

<p>WAF rule sets are a tool which we have to fine tune such that it strikes the right balance between protection and usability, as per your unique needs.</p>]]></content><author><name>Aalok Singh</name></author><category term="Cloud" /><category term="and" /><category term="Networking" /><summary type="html"><![CDATA[Azure Web Application Firewall (WAF) in Application Gateway helps protect web applications from threats by using pre-configured rules, including the ‘Open Web Application Security Project’ (OWASP) Core Rule Set (CRS). However, these rules can sometimes block legitimate traffic, causing false positives.]]></summary></entry><entry><title type="html">Decentralized Voting Dilemma</title><link href="https://aalok05.github.io/decentralization/2018/12/10/decentralized-voting-dilemma.html" rel="alternate" type="text/html" title="Decentralized Voting Dilemma" /><published>2018-12-10T00:00:00+00:00</published><updated>2018-12-10T00:00:00+00:00</updated><id>https://aalok05.github.io/decentralization/2018/12/10/decentralized-voting-dilemma</id><content type="html" xml:base="https://aalok05.github.io/decentralization/2018/12/10/decentralized-voting-dilemma.html"><![CDATA[<p>I believe that any system that can be decentralized, should be decentralized. Popular Proof of work DLTs, however, involve broadcasting all successful transactions to the network, so that all participating nodes have a copy of these transactions. Consensus is thus maintained, however privacy is compromised.</p>

<p>Privacy and decentralization often struggle to exist in the same system. Sure a public key can’t be pin pointed to an individual so easily. But when it comes to scenarios where you have to ensure that one individual gets to perform an action only once, that’s where it gets tricky.</p>

<p>If democracy were to be really decentralized, we would trust no authority to count votes for us. In an ideal system, citizens would interact with a smart contract to vote and the count of votes would be agreed upon by a consensus algorithm.</p>

<p>Electoral voting is a scenario where privacy and decentralization both are paramount requirements, i.e. who <em>can</em> vote should not be determined by a central authority (Voting list), AND a voter’s vote should be private.</p>

<p>At the same time, the system should be self tallying so that any one authority is not responsible for counting votes.</p>

<h3 id="zk-snarks">zk-SNARKS</h3>

<p><img src="/assets/images/zcash.PNG" alt="Transaction types in Zcash" /></p>

<p><a href="https://eprint.iacr.org/2017/585.pdf">Voting using Zero knowledge proofs</a> are of immense help in a scenario where transactions (votes) need to be private but verifiable. The white paper linked before details how Internet voting can be done using popular zk proof based payment system called <a href="https://z.cash/">Zcash</a>, including anonymous voter registration.</p>

<p>It is important to note that Zcash relies on zk-snark public parameters to construct and verify the zero-knowledge proofs, these parameters were created via something called as <a href="https://www.youtube.com/watch?v=D6dY-3x3teM">Zcash ceremony</a>. It is up to you to trust these 6 individuals who birthed Zcash. This ceremony was akin to creation of a public-private key pair, where private key was destroyed in the end.</p>

<p>If a voting system is truly anonymous, the voter registration system should also be anonymous, i.e. no personally identifiable information is stored in a ‘whitelist’ of eligible voters. Also no authority should be able to discriminate as to who makes it to the voter list.
But how do we stop an individual voting multiple times using multiple public keys? This is the dilemma of decentralized voting. 
Even the specs for <a href="https://w3c-ccg.github.io/did-spec">Decentralized Identifiers (DID)</a> allow for an individual to posses more than one identifier, unless a federated/centralized identity manager is involved.</p>

<p>Here, we have to compromise a little bit, having a list of eligible voter addresses at least ensures there are no bogus votes. So such a system, using zk-SNARKS, ensures complete privacy of votes and is self tallying. However the entity that decides/oversees which voters make it to the voters list can still be biased.</p>

<p><a href="https://github.com/stonecoldpat/anonymousvoting">This</a> project demonstrates a self tallying voting system on Ethereum where votes are private. It does involve an election admin preparing a list of eligible voters.</p>

<h3 id="ring-signatures">Ring Signatures</h3>

<p><img src="/assets/images/Ring-signature.svg" alt="Ring signatures" /></p>

<p>Ring signatures are great for decentralized voting with voter anonymity. They allow for signatures to be endorsed by a group of keys without revealing which particular key is the signer. Popular cryptocurrency <a href="https://ww.getmonero.org/">Monero</a> uses ring signatures. 
Two great variants of ring signatures are linkable and traceable ring signatures.</p>

<h4 id="linkable-ring-signatures">Linkable ring signatures</h4>

<p>The property of linkability allows one to determine whether any two signatures have been produced by the same member (under the same private key). The identity of the signer is nevertheless preserved.</p>

<h4 id="traceable-ring-signature">Traceable ring signature</h4>

<p>In Traceable ring signatures, the public key of the signer is revealed, if they issue more than one signatures under the same private key. This can control bogus voting nicely, thus making traceable ring signatures a highly viable method to create a decentralized and anonymous e-voting system. 
<a href="https://arxiv.org/ftp/arxiv/papers/1804/1804.06674.pdf">Here’s</a> a good research paper on this.</p>]]></content><author><name>Aalok Singh</name></author><category term="decentralization" /><summary type="html"><![CDATA[I believe that any system that can be decentralized, should be decentralized. Popular Proof of work DLTs, however, involve broadcasting all successful transactions to the network, so that all participating nodes have a copy of these transactions. Consensus is thus maintained, however privacy is compromised.]]></summary></entry></feed>