Aalok’s Blog

Building a session history manager for LLM agents

2026-03-12T00:00:00+00:00

How to build a production-grade session history manager for LLM agents

Where This Comes From

I’ve been building an agentic solution that involves multiple sub-agents orchestrated together - think a main planning agent that delegates to specialised sub-agents for RAG retrieval, tool execution, and code generation. Each sub-agent adds its own system prompt, tool schemas, retrieved documents, and generated code blocks to the conversation. A single user request can fan out into dozens of internal messages before a final answer surfaces.

It worked beautifully in demos. Then real users started having actual conversations.

Within 15–20 turns, the context window was full. Users started seeing this in production:

context limit exceeded

It was because every tool call, every chunk of RAG context, every block of generated code, and every sub-agent handoff was silently piling up in the session history.

I took a step back and built a dedicated context management engine to solve this for good. This post captures the key learnings, common pitfalls, and practical patterns I discovered.

The Problem (and Why “400K Context” Is Misleading)

Let’s take the model we’re actually using: GPT-5.3-codex. It advertises a 400K total context length. Sounds massive but here’s the breakdown:

Spec	Tokens
Input context limit	272,000
Output (max completion)	128,000
Total (input + output)	400,000

That 400K headline number is not what you get to play with. The output budget is reserved for the model’s response. Your session history, system prompt, tool schemas, RAG context - all of it must fit inside the 272K input limit. That’s the number that matters for session management.

And 272K still sounds like a lot - until you’re running a multi-agent pipeline. Here’s what a typical turn looks like under the hood:

Message	Role	Approx. tokens
User asks a question	`user`	~50
Planning agent reasons about which sub-agent to call	`assistant`	~200
RAG sub-agent retrieves 3 document chunks	`tool`	~2,000
Code-gen sub-agent produces a solution	`assistant`	~1,500
Tool call to execute/validate the code	`tool`	~500
Final synthesised answer	`assistant`	~300

That’s ~4,500 tokens for a single turn. Multiply by 40-50 turns in an extended working session and you’ve blown past 200K tokens without the user typing more than a few sentences. The context doesn’t grow linearly with conversation length; it balloons because of the compound effect of sub-agents, RAG payloads, and generated code.

Pitfall #1: Ignoring the Hidden Token Consumers

Most developers look at the spec sheet - “400K context!” - and assume they have that much room for chat. You don’t. The input limit is 272K, and even that is eroded by things you never see in the chat:

Hidden consumer	Typical cost
System prompt	1000 - 1500 tokens
Tool/function schemas (multiple agents)	1,000 - 5,000 tokens
Reserved output tokens	up to 128,000 tokens

With GPT-5.3-codex, if you reserve even a modest 16K for output and burn 3K on tool schemas across your sub-agents, your real budget is ~252K, not 272K.

# The ACTUAL budget formula
input_context_limit = 272_000           
system_tokens       = count_tokens(system_prompt)
reserved            = max_output_tokens + tool_schema_overhead
remaining           = input_context_limit - system_tokens - reserved
budget              = int(remaining * safety_fraction)  # 0.80 recommended

Lesson: Always compute your budget against the input context limit, not the total. And never assume the full input window is yours either - subtract system prompts, tool schemas, and output reservations first.

Pitfall #2: Trimming at the Wrong Boundary

The naive approach is to just drop the oldest messages until you’re under budget. But chat histories aren’t a flat list of independent messages - they contain logical groups that must stay together:

user: "What's the weather in Paris?"
assistant: [tool_call: get_weather("Paris")]    ← function call
tool: "22°C and sunny"                         ← function result
assistant: "It's 22°C and sunny in Paris!"     ← final reply

If you trim between the tool call and the tool result, the model sees an orphaned function call with no response - and it hallucinates or errors out.

The fix: align your trim boundary to the next role == "user" message. This guarantees you never split a tool-call sequence:

# Walk forward to the next "user" message
aligned = None
for j in range(keep_from, len(items)):
    if items[j].get("role") == "user":
        aligned = j
        break

if aligned is None:
    # No safe boundary exists - nuke the session
    await session.clear_session()
    return

keep_from = aligned

Lesson: Never trim mid-turn. Always snap to a role boundary.

Pitfall #3: Losing Context Silently

Trimming old messages solves the token problem - but destroys context. If the user mentioned their name, a project requirement, or a design decision 40 messages ago, that information is just gone.

The fix is to summarize before you trim:

conversation_text = "\n".join(
    f"{item['role']}: {item['content']}" for item in trimmed_items
)

response = await client.chat.completions.create(
    model=summary_model,
    messages=[{
        "role": "user",
        "content": (
            "Summarise this conversation excerpt in 2-4 sentences, "
            "preserving all key facts, decisions, and information "
            "the user may refer back to.\n\n"
            f"{conversation_text}"
        ),
    }],
    max_tokens=300,
    temperature=0.3,
)

summary = response.choices[0].message.content.strip()

Then inject the summary back into the session as the very first message:

await session.clear_session()

new_items = [{
    "role": "user",
    "content": f"[Previous conversation summary]: {summary}",
}]
new_items.extend(kept_items)

await session.add_items(new_items)

Your system prompt should explicitly tell the model to trust these summaries:

When you see a '[Previous conversation summary]' message,
treat it as a reliable recap of the earlier conversation.
Use it to maintain continuity.

Lesson: Trim the tokens, but compress the knowledge. A 3-sentence summary is worth 10,000 trimmed tokens.

Pitfall #4: Letting Summarization Failures Kill the Session

Summarization uses an LLM call, and LLM calls can fail - network issues, rate limits, content filters. If your summarization crashes and you haven’t trimmed, the next agent call will hit the context limit anyway.

Always trim first, summarize second, and treat summarization as best-effort:

summary_text = None
try:
    summary_text = await summarize(trimmed_items)
except Exception:
    logger.exception("Summarization failed - trimming without summary")

# Proceed with or without summary
await rewrite_session(summary_text, kept_items)

Lesson: Graceful degradation > fragile correctness.

Good Practice: Make the Budget Visible

Once I added a React-based session visualizer connected to the backend via WebSocket, debugging became 10x easier. I could literally watch the token budget shrink in real time:

# In your context manager, emit events via a callback:
if event_callback:
    await event_callback({
        "type": "budget_info",
        "context_window": context_window,
        "history_tokens": history_tokens,
        "budget": budget,
    })

I’d strongly recommend building a lightweight dashboard when developing agents. Watching the numbers change as you chat makes budget problems obvious before they become runtime errors.

Good Practice: Apply a Safety Fraction

Even with precise token counting, there’s always a margin of error - tiktoken counts don’t perfectly match the model’s internal tokenizer, and API overhead adds a few tokens per message.

A simple 0.80× safety fraction on the remaining budget gives you a comfortable 10% buffer:

budget = int(remaining * 0.80)

This single line has prevented more crashes than any other piece of code in the project.

The Complete Algorithm

Here’s the full strategy distilled into 9 steps:

Count tokens for system prompt + full session history
Reserve output tokens + tool-schema overhead
Apply safety fraction (0.80) to the remaining budget
Keep the newest messages that fit within budget
Align trim boundary forward to the next role == "user" message
Summarize the trimmed messages using an LLM (best-effort)
Inject the summary as the first message in the rewritten session
Clear session entirely if budget ≤ 0 or no safe boundary exists
Continue trimming even if summarization fails

Run this function before every agent turn, and your agent will handle conversations of unlimited length without ever hitting the context ceiling.

TL;DR

Pitfall	Fix
Assuming the full context window is available	Subtract system prompt, tool schemas, and output reservation
Trimming mid-turn (splitting tool calls)	Snap trim boundary to `role == "user"`
Losing important context when trimming	Summarize trimmed messages before discarding
Summarization failure crashing the session	Treat summarization as best-effort; always trim
Debugging budget issues blindly	Build a real-time visualizer dashboard

If you’re building with the OpenAI Agents SDK (or any LLM framework), these patterns will save you from the most common - and most frustrating - production failures.

The full POC code (Python backend + React visualizer) is available on my GitHub.

Building an Agentic AI Chatbot that Speaks DSL

2025-10-03T00:00:00+00:00

Over the last few months, I’ve been deep in the trenches of building agentic AI systems using the OpenAI Agents SDK. One of the most exciting projects I’ve worked on was creating a chatbot that takes a user’s natural language prompt and outputs code in a Domain-Specific Language (DSL). Along the way, we combined retrieval-augmented generation (RAG), semantic search with Azure AI Search, and a team of specialized agents working under a single orchestrator.

This post shares what I learned from the journey - both the technical architecture and the design decisions that made the system robust.

What is a DSL?

A Domain-Specific Language (DSL) is a mini programming language purpose-built for a narrow domain. Unlike general-purpose languages like Python or Java, which can implement anything from system software to web apps, DSLs are laser-focused.

Think of SQL for databases, regex for pattern matching, or Terraform’s HCL for infrastructure automation. In our case, the DSL defined how certain domain rules and workflows needed to be translated into executable logic.

The key challenge: users don’t like writing DSL directly. They want to describe rules in English and let the AI generate clean, valid DSL code.

The DSL I am working with is a proprietary language developed for a specific purpose within this organization. You can be sure no LLM was trained on this language as nothing about it is available on the open web. This is where a robust RAG system becomes important, as we will see.

The Core Problem

We needed a chatbot that:

Accepts user prompts in natural language.
Understands the intent and domain context.
Generates valid, executable DSL code, not just free-form text.
Ensures correctness by consulting external knowledge sources.
Decomposes complex user tasks into smaller, domain-specific steps.

Garbage In, Garbage Out: Data Preparation Was Critical

Before any agents could work their magic, we had to solve a foundational problem: the quality of our DSL documentation directly determined the quality of our outputs.

The original DSL documentation was messy - inconsistent formatting, outdated examples, duplicate sections, and mixed terminology. We learned the hard way that feeding raw, unsanitized data into a vector database produces unreliable retrieval results.

Here’s how we prepared the data:

Standardization: We normalized all DSL syntax examples, ensured consistent naming conventions, and removed deprecated constructs.
Chunking Strategy: Large documentation pages were split into logical chunks that represented complete concepts, along with some overlap.
Metadata Enrichment: Each chunk was tagged with metadata like DSL version, concept category, and complexity level to enable filtered retrieval.
Deduplication: We removed redundant examples and consolidated overlapping explanations.

Once sanitized, we generated embeddings using OpenAI’s text-embedding-3-large model and ingested them into Azure AI Search as our vector database. The lesson: spend 50% of your effort on data quality - it multiplies your retrieval accuracy exponentially.

Semantic Reranking: The Secret Weapon

Azure AI Search provided semantic search out of the box, but we went a step further by implementing semantic reranking.

Here’s how it works: when a user query comes in, the system performs two stages of retrieval:

Initial Retrieval (Vector Search): The query embedding is compared against all document embeddings in the index using cosine similarity. This returns the top 50 candidate chunks.
Semantic Reranking: Instead of just returning the top results from vector similarity, Azure AI Search’s semantic ranker uses a cross-encoder model that deeply understands the relationship between the query and each candidate document. It re-ranks the top 50 results based on semantic relevance, not just vector proximity.

Why does this matter? Vector similarity can sometimes retrieve chunks that mention similar terms but don’t actually answer the user’s intent. Semantic reranking ensures that the final top 5-10 results are contextually the most relevant to what the user is asking.

For example, if a user asks “How do I handle manager approvals in transaction rules?”, vector search might return chunks about “manager roles” or “transaction limits” separately. But semantic reranking understands the combined intent and prioritizes chunks that show the intersection of both concepts.

This dramatically reduced hallucinations and improved DSL generation accuracy.

Why Agents Instead of Just One Model?

A single LLM prompt could, theoretically, output DSL. But real-world usage taught us that:

Some user prompts required retrieving specific domain rules (from internal documentation).
The DSL had different subdomains, each requiring its own expert logic.
Errors in DSL syntax needed specialized correction strategies.

This is where the agentic approach shines. Instead of one monolithic LLM call, we built a multi-agent system coordinated by an Orchestrator Agent.

The Architecture

Here’s how the system came together:

Orchestrator Agent Uses the OpenAI Agents SDK to route the user’s request. It decides whether the query requires retrieval, DSL transformation, or validation. It is given ample context and instructions as system prompt.
DSL Experts (Helper Agents) Each agent is an expert in a subset of DSL. For example: *syntax, rules, validation *and so on.
Knowledge Retrieval (RAG + Azure AI Search) Users often referred to legacy rules, policies, and code snippets. To handle this, I integrated semantic search in Azure AI Search. The orchestrator first queries the knowledge base for relevant examples with semantic reranking, injects them into context, and passes them to the helper agents.
Feedback & Correction Loop If the generated DSL didn’t validate, the Orchestrator Agent would trigger a handoff to the Validator Agent, which corrected and finalized the output.

Learnings Along the Way

Data quality is non-negotiable. Garbage in, garbage out isn’t just a saying—it’s reality. Clean, validated, well-chunked documentation made all the difference.
Semantic reranking beats pure vector search. The cross-encoder reranking step eliminated false positives and surfaced truly relevant context.
Agent collaboration beats single-shot prompting. Letting agents specialize made the system modular, scalable, and less error-prone.
RAG is not optional in enterprise-grade DSL systems. When dealing with evolving rules, retrieval ensures correctness over hallucination.
Pausing and handoff are powerful. Using the Agents SDK’s ability to pause execution and let another agent take over kept us from “forcing” correctness in a single prompt.

Building Modular AI Agents: A Plugin Architecture

2025-09-13T00:00:00+00:00

Most AI solutions start simple but quickly become unwieldy monoliths. Whether you’re building chatbots, RAG systems, or AI assistants, you’ve probably experienced this evolution (although this is a problem with non AI projects as well):

Day 1: “Let’s build a simple AI assistant that can answer questions about our docs”

Day 30: “Now it needs to search our database, generate reports, and handle customer queries”

Day 90: “The system is slow, hard to test, and any change breaks something else”

Why This Happens

Knowledge Base Coupling: Everything gets thrown into one giant vector database
Pipeline Rigidity: One retrieval system tries to handle all content types
Prompt Bloat: A single “universal” prompt grows to handle every edge case
Integration Mess: One API endpoint doing everything becomes impossible to maintain

The Real Cost

Development slowdown: Teams can’t work independently
Testing nightmare: Need to spin up everything to test one feature
Deployment risk: Any change affects the entire system
Scaling inefficiency: Resource usage driven by your heaviest component

The Solution: Plugin Architecture

Instead of one massive RAG solution, build modular agents that have unique capabilities:

from abc import ABC, abstractmethod

class BaseModularAgent(ABC):
    """Base class for all Agents"""
    
    def __init__(self):
        self.agent = None
    
    @abstractmethod
    def get_metadata(self) -> dict:
        """Return metadata this agent"""
        return {
            "name": "agent_name",
            "description": "What this agent does"
        }
    
    @abstractmethod
    def get_prompt(self) -> str:
        """System prompt for the agent"""
        pass
    
    @abstractmethod
    async def initialize(self):
        """Initialize the agent"""
        # Create agent with the prompt
        pass

Core Principles

Single Responsibility: Each plugin does one thing well
Self-Contained: Plugins manage their own resources and dependencies
Discoverable: System automatically finds and loads available plugin agents
Composable: Plugins can work together or independently

Dynamic Discovery: Finding Your Plugin Agents

async def find_and_initialize_agents(directory: str):
    """ Discover and initialize plugin agents"""
    agents = []
    
    for file in os.listdir(directory):
        if file.endswith('_pluginagent.py'):
            # Load the module
            module = importlib.import_module(file[:-3])
            
            # Find agent classes
            for item in dir(module):
                obj = getattr(module, item)
                if (isinstance(obj, type) and 
                    issubclass(obj, BaseModularAgent) and 
                    obj != BaseModularAgent):
                    
                    # Create instance and initialize the agent
                    agents = obj()
                    await agents.initialize()
                    agents.append(agents)
    
    return agents

Modular Agent Example

Company Policy Agent

The below example omits the ‘retrieval’ part of RAG for simplicity. In real world this agent would most likely perform semantic search on a vector database which contains embeddings of company policy documents. It would define a function tool responsible for performing this search.

from agents import Agent

class CompanyPolicyAgent(BaseModularAgent):
    def get_metadata(self) -> dict:
        return {
            "name": "document_search",
            "description": "Search through company documents and policies"
        }
    
    def get_prompt(self) -> str:
        return """You are a document search specialist. Help users find relevant 
        documents, policies, and information from the company knowledge base. 
        Always provide accurate, cited results."""
    
    async def initialize(self):
        metadata = self.get_metadata()
        self.agent = Agent(
                name=metadata["name"],
                instructions=self.get_prompt(),
                model="o3"
        )

Putting It All Together with Agents SDK

from agents import Agent, Runner
import os
import importlib

async def main():
    # Discover and initialize modular agents and use them as tools for orchestrator agent
    agentModules = {}
    
    for file in os.listdir('./plugin_agents'):
        if file.endswith('_pluginagent.py'):
            # Load the module
            module = importlib.import_module(file[:-3])
            
            # Find agent classes
            for item in dir(module):
                obj = getattr(module, item)
                if (isinstance(obj, type) and 
                    issubclass(obj, BaseModularAgent) and 
                    obj != BaseModularAgent):
                    # Create instance and initialize the agent
                    pluginAgent = obj()
                    await pluginAgent.initialize()
                    
                    metadata = pluginAgent.get_metadata()
                    agentModules[metadata["name"]] = pluginAgent
                    print(f"Loaded agent: {metadata['description']}")
    
    # Convert each plugin agent into a tool
    tools = []
    for name, agent in agentModules.items():
        metadata = agent.get_metadata()
        tool = agent.agent.as_tool(
            name=metadata["name"],
            description=metadata["description"]
        )
        tools.append(tool)
    
    # Create orchestrator agent with all plugin agents as tools
    orchestrator = Agent(
        name="AI Assistant",
        instructions="""You are an intelligent assistant that can help with various tasks. 
        You have access to specialized agents through tools. Use the appropriate tool 
        based on what the user needs.""",
        model="o3",
        tools=tools
    )
    
    # Run the orchestrator agent with user prompt
    user_input = "Find our company policy on remote work"
    await Runner.run(orchestrator, user_input)

if __name__ == "__main__":
    asyncio.run(main()) 

Why This Approach Works

For Developers:

Build and test individual agents independently
Add new features without touching existing code
Debug issues in isolation

For Operations:

Deploy updates to specific agents only
Scale resources based on actual usage patterns
Monitor and troubleshoot individual components

For Business:

Faster feature development with parallel teams
Reduced risk of system-wide failures
Easy integration of new AI models and providers

Getting Started

Identify your AI capabilities - What different things does your system need to do?
Create separate plugins - Build one plugin agent per capability
Use the discovery pattern - Let your system automatically find agents
Start simple - Begin with basic implementations and evolve

The Bottom Line

Plugin architecture isn’t just about code organization - it’s about building AI solutions that can evolve with your business needs. Instead of rewriting everything when requirements change, you simply add, remove, or update individual capabilities.

Start small, think modular, and watch your AI solution become more maintainable, scalable, and powerful over time.

My conversations with LLM - How does TLS work?

2025-05-30T00:00:00+00:00

Some days you just wake up curious. Today when this video showed up in my feed, I had some questions about TLS protocol.

This took my down a rabbit hole of how Symmetric Key Derivation algorithms work. I’m sharing a succinct and paraphrased version of that conversation with gpt-4o:

Q: You are an expert in TLS protocol. Which keys are used to encrypt my HTTPS traffic?

Neither the public key nor the private key (asymmetric keys used to sign and validate certs) directly encrypts the actual traffic.
They are used during the handshake to establish a symmetric key.
The symmetric session key is then used to encrypt the actual data.

Q: Can you dive deeper into how symmetric session keys are used to encrypt traffic?

… Yes, the actual traffic is encrypted, but not with public/private keys. It’s encrypted using a symmetric encryption algorithm (like AES) after the TLS handshake is complete.

Let’s dive deeper into how this works:

1. Purpose of Public/Private Keys (Asymmetric Encryption)

Used only during the handshake phase of a TLS connection.
Enables:

Why not use public/private keys for actual data encryption?

Performance: Asymmetric encryption is computationally expensive and slower.
Security Best Practices: Symmetric encryption is faster and more suitable for large data volumes like website content.

2. Actual Traffic Encryption (Symmetric Encryption)

Once the handshake is complete:

A shared symmetric key (or a set of keys) is agreed upon.
From then on, all HTTP traffic is encrypted with this symmetric key, using algorithms like:

🔑 How Is the Symmetric Key Established?

It depends on the key exchange algorithm:

Modern TLS (using ECDHE):

Both sides generate ephemeral key pairs (private + public).
They exchange public parts.
Each side uses their private key + the other’s public key to derive the same shared secret (via elliptic curve math).
This shared secret is passed through a Key Derivation Function (KDF) to generate:

Q: What are the characteristics of symmetric keys?

🛡️Summary

🔁 Same key is used for both encryption and decryption
⚡ Very fast and efficient
🔢 Typically 128–256 bits (longer = more secure)
🔒 Must be securely exchanged or derived
🌐 Used in TLS to encrypt actual traffic after handshake
🔧 Common algorithms include AES and ChaCha20

Q: How difficult is it to get hold of session keys and why is it difficult?

🛡️Summary

☠️ Session key compromise is fatal — all traffic can be decrypted
🛡️ Extremely difficult to obtain the session key
🧠 Session key is never sent, is ephemeral, exists only in RAM, and is derived via secure ECDHE math
🚫 Attacker’s best chances are malware, memory scraping, or exploiting bugs (which are rare and detectable)

Q: You mentioned earlier that both parties use the same session key. However you also say that keys are never actually transmitted. How is that possible?

✅ Short Answer:

Yes, both client and server use the same session key(s).
But this session key is never transmitted over the network.
Instead, both parties independently derive the same session key, using secure mathematical operations during the key exchange phase.

🔑 Summary: Same Key, But Not Shared Explicitly

🔑 Symmetric session key is the same on both client and server
📵 It is never transmitted over the network
🧮 Each party derives it using ECDHE math on exchanged public keys and their own private key
🔐 Security benefit: even if someone intercepts all traffic, they can’t compute the key without a private key

Q: How is it possible that same exact secret is generated when a public key and a different private key (which could be anything) goes into a mathematical function?

That’s an excellent and deep question — and it’s the heart of Elliptic Curve Diffie-Hellman (ECDH) key exchange.

It does seem strange at first:

How can two parties, each using a different private key, and the other’s public key, end up computing the exact same secret?

Let’s walk through it mathematically but intuitively.

🔐 TL;DR:

The shared secret is the result of a scalar multiplication of elliptic curve points.
Due to the commutative nature of the math (like a × b = b × a), both parties arrive at the same result independently.
The result is the same because of how elliptic curve point multiplication works.

🔐 Final Insight

ECDH works because both parties use their own private key + the other’s public key to derive the same point — and the math ensures that the result is identical.

They don’t need to know each other’s private keys — they just rely on shared public information and the hard math problem underneath.

— Conversation Over —

That last bit about ECDH algorithm was fascinating! I hope you got to learn something from this as well.

Handling False Positives in Azure Application Gateway WAF with OWASP Rules

2025-03-07T00:00:00+00:00

Azure Web Application Firewall (WAF) in Application Gateway helps protect web applications from threats by using pre-configured rules, including the ‘Open Web Application Security Project’ (OWASP) Core Rule Set (CRS). However, these rules can sometimes block legitimate traffic, causing false positives.

Recently, one of our clients faced an issue where their WAF was incorrectly flagging legitimate users as malicious, denying them access. This disrupted operations and highlighted the need for fine-tuning WAF settings.

Why False Positives Happen

False positives occur when WAF misidentifies a legitimate request as an attack. Common reasons include:

Strict OWASP Rules – OWASP CRS is aggressive by default, sometimes blocking normal traffic that resembles attack patterns.
Special Characters in Input – If a request contains characters like < script > or SQL-like keywords, it may trigger SQL Injection or XSS rules.
APIs and JSON Payloads – API requests with structured data (JSON/XML) can be misclassified as malicious.
Sensitive Query Parameters – URLs with words resembling attack payloads might get blocked.
Unique Application Behavior – Custom workflows may not align with predefined WAF rules.

How Anomaly Scoring Works

WAF assigns a score to each request based on detected rule violations. If the cumulative anomaly score exceeds a threshold (default: 5), the request is blocked. Lowering the score threshold increases sensitivity, while raising it reduces false positives.

Identifying False Positives

Check WAF Logs – Logs in Azure Monitor show details of blocked requests and rule IDs.
Use Detection Mode – Running WAF in “Detection” mode helps identify issues without blocking traffic.
Analyze Blocked Requests – Reviewing request payloads helps pinpoint the cause of false positives.

Fixing False Positives

1. Adjust OWASP Rule Set Sensitivity

Azure WAF allows you to choose different OWASP CRS versions (e.g., 3.1, 3.2), each with varying levels of strictness. Some newer versions reduce false positives by improving attack pattern detection. If legitimate requests are frequently being blocked, consider testing a different OWASP rule set version to see if it results in fewer disruptions.

Additionally, adjusting the anomaly scoring threshold can help. If the WAF is blocking too many valid requests, increasing the anomaly score threshold can allow more traffic through while still preventing actual threats. Conversely, lowering the threshold can make WAF more aggressive in blocking suspicious activity.

2. Create Custom Rules

In some cases, predefined WAF rules may not align with specific application behavior. Creating custom rules allows for more flexible traffic control. Here’s how:

Allow trusted IP ranges for internal applications or API requests.
Bypass specific headers, query parameters, or request bodies that are known to trigger false positives.
Define rules to exclude requests from evaluation based on user roles or authentication status.

Custom rules should be tested in Detection mode before enforcing them to ensure they don’t introduce vulnerabilities.

3. Tune Request Body Inspection

By default, WAF inspects request bodies to detect malicious payloads, but this can sometimes lead to false positives when processing large JSON or XML payloads. To fine-tune this:

Increase the request body size limit if valid requests are getting blocked due to payload size.
Configure content type exclusions to prevent unnecessary inspections for non-critical endpoints.
Modify request body inspection thresholds to ensure only high-risk content gets flagged.

4. Test with Detection Mode

Run WAF in Detection mode before enforcing strict blocking to observe behavior. Detection mode helps analyze the impact of WAF rules without actually blocking traffic

Conclusion

A key takeaway for me in the recent events at one of our clients was - there cannot be a ‘checkbox’ approach to security, where we enable OWASP rules and forget, assuming everything is now secure.

WAF rule sets are a tool which we have to fine tune such that it strikes the right balance between protection and usability, as per your unique needs.

Decentralized Voting Dilemma

2018-12-10T00:00:00+00:00

I believe that any system that can be decentralized, should be decentralized. Popular Proof of work DLTs, however, involve broadcasting all successful transactions to the network, so that all participating nodes have a copy of these transactions. Consensus is thus maintained, however privacy is compromised.

Privacy and decentralization often struggle to exist in the same system. Sure a public key can’t be pin pointed to an individual so easily. But when it comes to scenarios where you have to ensure that one individual gets to perform an action only once, that’s where it gets tricky.

If democracy were to be really decentralized, we would trust no authority to count votes for us. In an ideal system, citizens would interact with a smart contract to vote and the count of votes would be agreed upon by a consensus algorithm.

Electoral voting is a scenario where privacy and decentralization both are paramount requirements, i.e. who can vote should not be determined by a central authority (Voting list), AND a voter’s vote should be private.

At the same time, the system should be self tallying so that any one authority is not responsible for counting votes.

zk-SNARKS

Voting using Zero knowledge proofs are of immense help in a scenario where transactions (votes) need to be private but verifiable. The white paper linked before details how Internet voting can be done using popular zk proof based payment system called Zcash, including anonymous voter registration.

It is important to note that Zcash relies on zk-snark public parameters to construct and verify the zero-knowledge proofs, these parameters were created via something called as Zcash ceremony. It is up to you to trust these 6 individuals who birthed Zcash. This ceremony was akin to creation of a public-private key pair, where private key was destroyed in the end.

If a voting system is truly anonymous, the voter registration system should also be anonymous, i.e. no personally identifiable information is stored in a ‘whitelist’ of eligible voters. Also no authority should be able to discriminate as to who makes it to the voter list. But how do we stop an individual voting multiple times using multiple public keys? This is the dilemma of decentralized voting. Even the specs for Decentralized Identifiers (DID) allow for an individual to posses more than one identifier, unless a federated/centralized identity manager is involved.

Here, we have to compromise a little bit, having a list of eligible voter addresses at least ensures there are no bogus votes. So such a system, using zk-SNARKS, ensures complete privacy of votes and is self tallying. However the entity that decides/oversees which voters make it to the voters list can still be biased.

This project demonstrates a self tallying voting system on Ethereum where votes are private. It does involve an election admin preparing a list of eligible voters.

Ring Signatures

Ring signatures are great for decentralized voting with voter anonymity. They allow for signatures to be endorsed by a group of keys without revealing which particular key is the signer. Popular cryptocurrency Monero uses ring signatures. Two great variants of ring signatures are linkable and traceable ring signatures.

Linkable ring signatures

The property of linkability allows one to determine whether any two signatures have been produced by the same member (under the same private key). The identity of the signer is nevertheless preserved.

Traceable ring signature

In Traceable ring signatures, the public key of the signer is revealed, if they issue more than one signatures under the same private key. This can control bogus voting nicely, thus making traceable ring signatures a highly viable method to create a decentralized and anonymous e-voting system. Here’s a good research paper on this.