ALD Blog

What Building AI Chatbots Taught Me About Simplicity

FormulatedBy — Tue, 10 Mar 2026 17:46:24 +0000

I spent months overengineering an AI chatbot. Then I threw most of it away and got better results in two weeks.

This is an anecdote of what actually worked when I built production RAG systems at scale, and why the lessons surprised me.

The Complexity Trap

I was tasked to build a chatbot which had certain important team documents as its context. These documents had business domain knowledge. The main idea was to retrieve relevant information from the documents based on customer/user query and provide the most accurate response. When I first started building this conversational AI system, I did what any engineer would do: I read the papers, studied the frameworks, and built something impressive. Semantic chunking with overlap windows. Multi-vector retrieval with re-ranking. Hybrid search combined with dense vector embeddings. A beautiful, sophisticated architecture.

But it hallucinated constantly on our documents.

The problem wasn’t that the techniques were wrong. The problem was that I’d built a generic system for a specific problem. Our documents had structure, numbered sections, hierarchical headings, procedural steps. My fancy semantic chunking was actively destroying the very information users needed.

The fix was embarrassingly simple: hierarchical chunking that respected document structure. Instead of treating every document like a wall of text, I preserved the natural hierarchy. Headers stayed with their content. Procedures remained intact. Parent-child relationships between sections were maintained.

Accuracy jumped. Hallucinations dropped. And I learned my first hard lesson: understand your data before you engineer around it.

The Prompt Hierarchy

Here’s something that took me too long to accept: your model is only as good as your prompts.

I had been treating prompts as an afterthought. A thin wrapper around the “real” work happening in retrieval and embedding. But when I started experimenting with few-shot examples and structured outputs, everything changed.

Few-shot prompts are prompts in which you provide certain examples of input and output samples to the LLM. Instead of hoping the model would figure out our format, I showed it exactly what I wanted. Three examples of input-output pairs, and suddenly responses followed a predictable structure. Quality control became possible because outputs were predictable.

Structured outputs eliminated an entire category of bugs. JSON schemas meant no more parsing failures. No more responses that were technically correct but impossible to process downstream. The model understood not just what to say, but how to say it.

This sounds obvious written down. It wasn’t obvious when I was knee-deep in embedding optimization, convinced that retrieval quality was my bottleneck. Sometimes the leverage is in the last mile, not the foundation.

Intent Classification Changed Everything

The biggest architecture win came from a simple insight: classify intent first.

My early systems tried to handle everything in one flow. User message goes in, retrieval happens, response comes out. But users ask wildly different types of questions. Some want facts. Some want procedures. Some are complaining. Some are confused about their own question.

Treating them identically made no sense.

I rebuilt the system with an LLM-powered intent classifier at the front. Not keyword matching, that’s too brittle. A lightweight LLM call with structured output that categorized the query and extracted key entities. The classifier told me what kind of response the user actually needed before I committed to a retrieval strategy.

The result was cleaner code, faster responses, and dramatically better user satisfaction. Each intent type got its own optimized flow. Procedural questions hit the hierarchical chunks. Factual queries used dense retrieval. Complaints got routed differently entirely.

A small amount of intelligence at the routing layer saved enormous complexity downstream.

The Deterministic/Non-Deterministic Split

The most important architectural decision I made was drawing a clear line: deterministic actions get functions, non-deterministic decisions get the LLM.

What does this mean in practice? Database lookups, API calls, calculations, status checks: these are deterministic. The answer is knowable, consistent, and shouldn’t vary. I wrapped these in functions the LLM could call, but the LLM didn’t execute them. It decided when to call them and what arguments to pass. The actual execution was reliable code.

The LLM handled what LLMs are good at: understanding intent, generating natural language, synthesizing information, making judgment calls when data was ambiguous. I stopped trying to make it do math or remember precise numbers.

This separation made the system debuggable. When something went wrong, I could immediately identify whether it was a function error (deterministic, reproducible) or a model error (needs prompt tuning). Before this split, errors disappeared into a fog of probabilistic behavior.

Clean boundaries between deterministic and non-deterministic components turned chaos into engineering.

Simple Beats Clever

Looking back, every major improvement came from simplification, not sophistication. Respecting document structure instead of fighting it. Using the model’s strengths instead of compensating for weaknesses. Drawing clear boundaries instead of building monolithic flows.

The frameworks and papers have their place. But they’re solutions to general problems. Your problem is specific. The best architecture is the one that fits your data, your users, and your constraints, not the one that impresses other engineers.

I still read the papers. I still experiment with new techniques. But now I start simple and add complexity only when I’ve proven it’s necessary. The code I’m proudest of isn’t the most sophisticated. It’s the code that works reliably, fails predictably, and can be understood by the next engineer who inherits it.

That’s what building AI chatbots taught me. Not how to be clever with vectors and embeddings, but how to be disciplined about simplicity.

The best RAG system isn’t the most advanced one. It’s the one that actually helps your users.

Author: Utkarsh Bajaj

The post What Building AI Chatbots Taught Me About Simplicity appeared first on ALD Blog.

Quantifying Data Drift in Categorical Variables using Characteristic Stability Index (CSI)

FormulatedBy — Wed, 04 Mar 2026 09:06:53 +0000

Due to the recent advancements in artificial intelligence (AI) and machine learning (ML), organizations are increasing investments in information technology (IT) infrastructure that best supports automation enabled by AI and ML. While these automated systems can guide strategic initiatives of the organization, the performance and reliability of predictions are dictated by the quality of data fed into these automated systems. Any changes in the data characteristics while comparing two separate populations is called data drift.

Any data drift observed between the current population over which the automated system is predicting compared to the reference population over which the automated system is trained on can significantly impact the overall prediction accuracy of the decisioning systems. Considering the risks of data drift, the monitoring of attributes or key business variables is critical in production environments to ensure stable inputs to data models leveraged by the automated systems. Tracking data drift enables organizations to tailor business strategies, monitor production changes, and respond to defects in real time. As an example, consider a scenario where there is a model which uses the originating channel of the incoming consumer to decide risk. A sudden drift in applications originating from higher risk channels can negatively affect the overall model performance. To counter this, a business may introduce added strategy rules to limit the overall risk or exposure. This article introduces an adaptive and scalable approach to mitigate risks of data drift and automate the process for data drift detection that can be extended to different organizations and domains.

Introducing CSI to Detect Data Drift

In simple terms, Characteristic Stability Index (CSI) helps quantify the degree of the data drift of categorical variables across any two given populations. The formula to calculate CSI is as follows:

In general, based on the calculated value of CSI, one can evaluate the degree of data drift.

If the CSI value is less than 0.1, this signals extremely low drift or no change.
If the CSI value is between 0.1 and 0.2, this signals a minor drift.
If the CSI value is greater than 0.2, this signals a significant drift.

While these are generalized standard thresholds to measure data drifts, one can change these thresholds depending on business needs. As an example, if an attribute has an exceedingly high importance in the model prediction, the threshold can be lowered to increase the sensitivity of data drift detection. The power of CSI in automating data drift detection in production pipelines lies in its simplicity and standard interpretation that can be easily explained to business users.

Designing Automated Data Drift Frameworks

The following section documents the steps involved in developing automated data drift monitoring in production environments. To understand the CSI calculation, we assume a sample attribute called “traffic_type” which tracks the distribution of traffic categories. We capture the data for the traffic_type attribute for the months of January and February. For this analysis, we consider January as the baseline or reference month, while February is considered as the current month. The aggregate volumes for the “traffic_type” for the months of January and February are 1000 rows each. The data distribution of the “traffic_type” for the months of January and February is as follows:

The steps involved in designing automated data drift frameworks leveraging the CSI metric are as follows:

Connect to the database hosting the data
Pull in reference and current populations to be tracked
Identify the specific categorical variables to be tracked
Employ the CSI calculation
1. Identify individual categories as bins on the reference population
  1. Considering the January month of the attribute “traffic_type”, we list the individual categories such as File_Transfer, Gaming, IoT, Other, Video_Streaming, VoIP and Web_Browsing.

Once the bins are identified on the reference population, the same bins are enforced upon the current population to measure distribution of data points.
Calculate the reference and current counts for each of the individual categories identified in the previous step.

Calculate proportions of each individual category for both the reference and current populations. Considering the total aggregate volume for the months of January and February are 1000 each. We divide each category by 1000 to compute their relative proportions.

Using the formula for CSI, we compute CSI contribution values for each of the individual categories.

Aggregate all the “CSI Contributions” values to compute the final aggregate CSI value for the months of January and February for the attribute “traffic_type”.

Compare the calculated aggregate CSI value against the threshold values for drift sensitivity.
Determine if an alert should be triggered or not
1. If the calculated aggregate CSI value for any of the attributes is greater than the threshold value, then trigger an alert.
2. If the calculated aggregate CSI value is less than the threshold value, then do not trigger an alert.

These steps described can be executed on a predefined cadence to ensure the automated monitoring runs are executed and teams are aware of any drift that becomes evident in any of the key business variables.

Conclusion

In production systems, data continues to evolve and it is critical for organizations to deploy automated data drift detection frameworks to ensure undetected data drift does not impact downstream analytics or processes. From this demonstration, we were able to validate that CSI offers a practical way to detect distributional changes in categorical attributes to mitigate risks of data drift.

Author: Anil Cavale

The post Quantifying Data Drift in Categorical Variables using Characteristic Stability Index (CSI) appeared first on ALD Blog.

Strategies to help your data & AI project avoid failure

FormulatedBy — Wed, 25 Feb 2026 17:31:16 +0000

Most data and AI projects don’t fail because teams chose the wrong infrastructure or modeling approach. They fail because the project breaks down somewhere across the lifecycle, long before folks realize. By the time something reaches production (if it does), teams are frequently over budget, misaligned, or under-delivering for reasons that were baked in months earlier.

Reducing that risk requires more than better execution in one area. You need to understand the full lifecycle of a data or AI project, where failure modes tend to emerge, and which issues are cheapest—and most critical—to address early.

In practice, most data and AI initiatives move through three phases: Planning, Building, and Shipping. Each phase has distinct goals and risks. The following is a practical walkthrough of those phases, the traps that appear most often, and tactics you can use to manage risk and increase the chances of delivering a successful data project.

Planning: where probability of failure is highest

Planning sets the trajectory. It determines what the team is optimizing for, who owns outcomes and decisions, and how success will be measured. When planning is rushed or underdeveloped, the odds of failure increase dramatically.

This stage typically includes defining purpose, clarifying roles, setting budgets and timelines, and designing the solution. Each sounds obvious. Each is also a common failure point.

Stakeholder misalignment

A frequent failure mode is stakeholder misalignment disguised as agreement. Everyone nods along in kickoff meetings, but they’re optimizing for different outcomes (and often multi-tasking!). Leadership wants quick wins. Product wants shipped features. Data teams want to avoid tech debt. When tradeoffs surface later, these hidden incentives collide.

One of the simplest and most effective ways to get ahead of this is to surface hidden assumptions. In general, people don’t know what goes into the work done by other teams. It helps during alignment to identify that. Asking each person “What’s your biggest concern?” and then unpacking that together helps to flush out hidden assumptions and get closer to alignment. People have different goals and incentives – surface that before starting the project.

Roles and responsibilities

Another planning failure is unclear accountability, especially when teams are involved. When everyone is responsible, no one is. This shows up later when momentum stalls and folks are left scratching their heads as to who approves what, or where input is needed and from whom.

The fix is explicit ownership. From the top, executive sponsorship for the project; specific ownership for key parts. But the key insight is that it needs to be one person, not a committee. And there’s a distinction between accountability (which is ownership) and responsibility (which is execution). These can be, and are often, decoupled, but they need to be explicit.

Budgets and timelines

Budgeting is another blind spot. Projects are often approved with optimistic assumptions about data readiness, tooling, or expert availability. When reality intrudes (unexpected sick leave, urgent project out of nowhere, etc.), timelines slip and confidence erodes.

Having a plan is great, but understanding where the plan is flexible, and what would be true in order to know when to flex the plan, is better. Build in contingencies, iteration, extra time, and other planning best practices. Break down the tasks into high and low confidence ranges, and build up the plan from there. This will inflate your budget and timeline, but that is the cost of optionality (and ideally, some of that added margin won’t be needed).

Solution design

Planning also fails when solutions are over-designed too early. Teams lock in architectures and metrics before seeing the data or testing workflows. A common failure is promising a predictive model before proving the data supports it.

A more resilient approach is to treat planning as hypothesis-setting. Define what you believe to be true, what would prove you wrong, and what you’ll do if that happens. Be honest with what you know and can stand behind, and what needs to be iterated on. Planning isn’t about certainty; it’s about optionality and risk management.

If planning defines the project’s risk profile, building is where those risks are exposed.

Building: where the plan meets reality

Building is where ideas become artifacts. Wireframes turn into codebases. Gantt charts get revised daily. This is also where many well-intentioned plans unravel.

This phase includes prototyping, coding, testing, and validation. It’s inherently messy and iterative – as you move forward, you often uncover reasons to step back.

Prototyping

A major failure mode is building in isolation from users and decision-makers. Teams optimize for technical correctness without validating usefulness. By the time feedback arrives, the system is too far along to change without pain.

The solution is early and continuous exposure. Share rough MVPs that work end to end. The faster you get something in front of a stakeholder, the faster you’ll learn that what they need wasn’t what they said in the plan. They also need to see and feel the output in order to be better partners for you.

Coding

Writing the actual code is a long and arduous process (though is being upended with AI assistants). That said, this part often fails due to “resume-driven development” – where teams choose complex, trendy frameworks (like heavy orchestration tools or agentic chains) when a simple script would suffice. This over-engineering creates a maintenance burden that outweighs the value of the solution.

The fix is ruthless simplicity. Start with the boring solution. Use a simple cron job before a complex orchestrator; use a direct API call before a complex agent framework. Add complexity only when the specific use case demands it, not because the technology is interesting. The goal of coding is to solve the problem with the minimum amount of code necessary. Bonus points – the boring solution is the simpler one, and the simpler one gets you feedback faster.

Testing

Many teams reach 100% test coverage on their logic but fail on integration. They test the model with a clean CSV on a laptop, ignoring the chaos of the production environment. They don’t test what happens when the API times out, when the model context window overflows, or when concurrent user requests spike latency. All stuff that happens in reality (and more!).

Teams need to test the “seams” of the system, not just the units. Run end-to-end integration tests that mock the production environment, including latency, rate limits, and imperfect data inputs. Intentionally inject chaos – disconnect the database, spike the traffic, or send malformed JSON – to verify that the system handles errors in the way you expect (whether gracefully or a hard fail).

Validation

Validation often fails when teams mistake technical correctness for readiness. Models pass offline metrics, pipelines run cleanly, spreadsheets match before and after – but stakeholders still need more. When validation happens late, feedback turns into scope creep instead of learning.

The fix is continuous, decision-centric validation. Use artifacts like decision walkthroughs, shadow runs, or scenario reviews to test outputs throughout the process. Walk through how results would have changed past actions, where judgment remains necessary, and where the system is likely to be wrong. The goal is shared understanding before shipping forces the issue. Stakeholders need to trust not simply that the model matches before and after, but that it’ll continue to be trustworthy going forward. Building trust takes time and partnership.

If building proves something works, shipping proves it works under real constraints.

Shipping: where many projects stumble hardest

Shipping is where many data and AI projects often fail outright. The system works locally but breaks downstream or proves brittle to inevitable production changes.

This phase includes UAT, deployment, change management, and ongoing monitoring. Treating shipping as a one-time event instead of a state transition is a costly mistake.

User acceptance testing

A common failure is rushed or performative UAT. Users are asked to “sign off” without time or incentive to engage. Problems surface later, when fixes are harder and trust erodes.

Make UAT real. Give users space to test within actual workflows. Encourage critical feedback. Treat signoff as confirmation of readiness, not a box to check.

Deployment

Deployment often fails when treated as purely technical. Code passes CI/CD, but security reviews, permissions, data access, or upstream dependencies stall release. Even successful deployments can be brittle if assumptions don’t hold in production.

Design for deployment early (this is mentioned above as well). Establish environments and access patterns during building, not at the end. Use parallel runs to surface discrepancies and build confidence. Define operational ownership at launch: who monitors, who responds, and who can pause or roll back the system (which should have already been defined during Planning).

Change management

Another failure is neglecting change management. Even strong systems fail if users don’t understand or trust them. Documentation alone isn’t sufficient. Training is helpful, but limited. Communication needs to be constant and repeated. You need to both communicate the what and the why, and most importantly, why this matters to the stakeholder.

Monitoring and maintenance

Monitoring is often an afterthought, but for predictive models (including LLMs) it’s extremely critical. Without a clear monitoring plan, systems quietly decay. Drift, anomalies, and subtle failures go unnoticed – especially when outputs look plausible but are wrong.

The fix is operational clarity. Who monitors outcomes? What triggers investigation? How are fixes prioritized and deployed? Shipping isn’t the end of the lifecycle; it’s the start of operations.

Pulling it together

Across planning, building, and shipping, the pattern is consistent. Data and AI projects rarely fail because of technology decisions. They are more likely to fail due to operational and process issues that weren’t thought through ahead of time, and the teams need to play catch up.

Successful teams treat the lifecycle as a system, not a checklist. They invest early in planning, expect plans to change during building, and treat shipping as a transition into operations.

Code, models, and infrastructure matter – but they’re enablers, not foundations. They amplify whatever structure already exists – for better or worse.

The post Strategies to help your data & AI project avoid failure appeared first on ALD Blog.

From Reactive to Autonomous Architecting LLM-Driven Workflows for IT Incident Response

FormulatedBy — Tue, 03 Feb 2026 17:56:12 +0000

In the high-stakes world of IT infrastructure, the difference between a minor glitch and a major outage is often measured in minutes. Yet, for most organizations, the incident response lifecycle remains stuck in a manual era. As cloud environments grow more complex and the global shortage of skilled IT professionals intensifies, the traditional approach to operations is reaching a breaking point.

To understand how Large Language Models (LLMs) are revolutionizing AIOps, we must first dissect the anatomy of a failure as it exists today and map out the architecture of the automated future.

This article analyzes the structural shift in IT operations by examining four critical stages of the incident lifecycle visualizing how we move from human bottlenecks to AI-driven orchestration and provides a technical blueprint for building these agents safely.

Here’s an outline of the article:

The Current State
The Structural Replacement (The Automation Concept)
The Operational Architecture and Workflow
Code Implementation: Building the Agent
Navigating Challenges: Security and Hallucinations
Future Directions
Key Takeaways

The Current State

To solve the problem, we must first visualize the bottleneck. In traditional IT operations, there is a distinct gap between “System Detection” and “Human Action.”

The Incident Timeline

When a failure occurs, the monitoring system detects it almost instantly. However, the process immediately stalls. The alert sits in a queue until a human operator notices it,

confirms it is not a false positive, and begins the triage process. This gap between the machine detecting the issue and the human understanding is where Service Level Agreements (SLAs) are breached.

The Scope of Human Toil

Once the operator engages, they are burdened with a complex web of manual tasks. As illustrated in the paper’s analysis, the “Human Response Scope” involves switching context between multiple tools:

Verification: Confirming the alert content.
Log Retrieval: Manually SSH-ing into servers to pull error logs.
Knowledge Retrieval: Searching wikis or calling support desks to see if this issue has happened before.
Communication: Drafting emails to stakeholders to report the incident.

This manual workflow is error-prone and slow. The sequence diagram below visualizes this legacy process, highlighting the dependency on human bandwidth.

Figure 1: Current State Incident Response Sequence Diagram

The Structural Replacement (The Automation Concept)

The core proposition of modern AIOps is to replace the Human Operator with an Intelligent Agent.

Replacing the Operator

The objective is to remove the human from the initial response loop. Instead of an alert triggering a pager, it triggers an AI-Driven Automation Tool. This tool acts as the new operator. It doesn’t just forward the alert; it consumes the alert, gathers the necessary context, and passes it to an AI Service (LLM).

In this new paradigm, the human moves from being the doer (fetching logs, typing emails) to being the reviewer (approving the fix). This shift effectively collapses the time delay shown in the previous phase.

Figure 2: Before-versus-After Structure Proposition

The Operational Architecture and Workflow

How does this work in production? The final architecture reveals a complex interplay between the LLM, the operational tools, and historical data.

The Anticipated Workflow

This architecture relies on two distinct phases: Training (Preparation) and Inference (Response).

1. The Knowledge Ingestion (Training): Before the system goes live, the LLM is fine tuned or provided with a RAG (Retrieval Augmented Generation) database containing:

Historical incident tickets.
System logs from previous failures.
Runbooks and tribal knowledge from the support desk.

Why this matters: This ensures the AI isn’t guessing; it is applying institutional memory to the current problem.

2. The Autonomous Loop (Inference):

Trigger: The Monitoring Service detects a failure.
Orchestration: The Automated Query Tool receives the alert.
Log Ingestion: It uses an API to pull real-time logs from the affected machine (Infrastructure).
Reasoning: It sends the Error Message + Real-time Logs to the LLM. The LLM correlates this with the Knowledge Base to identify the Root Cause.
Action: The LLM directs the Email Tool to draft a notification and the Operation Tool to execute remediation commands for reporting purposes.

The diagram below details this end-to-end workflow flow.

Figure 3: End-to-end Operational Flow

Code Implementation: Building the Agent

To visualize how this works technically, consider a simplified Python example using a framework like LangChain.

The agent utilizes tools to interact with the infrastructure, mirroring the API connections.

Algorithm: Python code using LangChain For Building the Agent

from langchain.agents import initialize_agent, Tool 
from langchain.llms import OpenAI 
from infrastructure_tools import fetch_server_logs, check_cpu_usage, restart_service # Define the tools the LLM can use (The "Hands" of the system) 
tools = [ 
  Tool( 
    name="Fetch Logs", 
    func=fetch_server_logs, 
    description="Useful for retrieving raw error logs from a specific server ID." ), 
  Tool( 
    name="Check CPU", 
    func=check_cpu_usage, 
    description="Checks current CPU load." 
  ), 
  Tool( 
    name="Restart Service", 
    func=restart_service, 
    description="Restarts a system service. Use with caution." 
  ) 
] 
# Initialize the LLM (The "Brain") 
llm = OpenAI(temperature=0) # Low temperature for deterministic outputs # Initialize the Agent 
agent = initialize_agent(tools, llm, agent="zero-shot-react-description",  verbose=True) 
# Simulate an incoming alert from the monitoring system 
incoming_alert = { 
    "server_id": "srv-prod-04", 
    "error_msg": "502 Bad Gateway - Connection Refused", 
    "timestamp": "2025-12-27T03:14:00Z" 
} 
# The Agent executes the reasoning loop 
prompt = f"System alert received: {incoming_alert}. Investigate the logs and suggest a  fix." 
response = agent.run(prompt) 
print(response) 
# Output might look like:  
# "I have fetched the logs for srv-prod-04. The logs indicate the Nginx service has  crashed due to memory overflow. I recommend restarting the Nginx service."

Navigating Challenges: Security and Hallucinations

While the potential is immense, deploying Generative AI in production infrastructure requires strict guardrails.

The Hallucination Risk: LLMs can confidently sound wrong. In an IT context, a “hallucinated” command could delete a database. To mitigate this, the system should operate with a Human-in-the-Loop (HITL) for critical actions. The AI performs the investigation and proposes the fix, but a human engineer approves the execution of write-commands until the system proves its reliability.
Data Privacy and Security: Infrastructure logs often contain sensitive IP addresses, internal hostnames, or even leaked PII. Before any log data is sent to an LLM (especially if using a public API model), it must pass through a Sanitization Layer. This layer uses regex or Named Entity Recognition (NER) to mask sensitive data (e.g., replacing an IP with [IP_ADDRESS_1]).
Transparency: The Black Box problem is real. Operators need to know why the AI suggests a restart. The system must cite its sources: “I recommend this fix because it successfully resolved a similar incident (Ticket #4092) on March 12th.”

Future Directions

The evolution of this technology points toward Proactive Self-Healing. Instead of waiting for a failure, future iterations will analyze trend data to predict outages before they occur. By identifying “pre-incident” log patterns, the Agent could scale up resources or rotate credentials proactively, preventing the downtime entirely.

Furthermore, we will see a move toward “Small Language Models” (SLMs) fine-tuned specifically for DevOps tasks. These models will be smaller, faster, cheaper to run, and capable of running on-premises to alleviate data privacy concerns.

Key Takeaways

Reduction in MTTR: By automating the initial triage and log gathering, response times can be cut from hours to minutes.
Knowledge Democratization: The LLM acts as an institutional memory bank , allowing junior engineers to solve complex problems using the collective wisdom of the organization.
Scalability: AI agents can handle hundreds of simultaneous alerts, preventing the bottleneck that occurs when human teams are overwhelmed during major outages.
Guardrails are Essential: Implementation must prioritize data sanitization and human oversight to ensure safety and security.

The transition to LLM-powered AIOps is not merely about installing a chatbot; it is about fundamentally re-wiring the data flow of incident response to let machines handle the data, so humans can handle the decisions.

The text was delivered by Dippu Kumar Singh, Senior Solutions Architect at Fujitsu Americas and a speaker at upcoming Data Science Salon Austin conference on February 18. Secure your spot!

The post From Reactive to Autonomous Architecting LLM-Driven Workflows for IT Incident Response appeared first on ALD Blog.

The Versatile AI Product Manager: Beyond the Hype, the Core Qualities for Success

FormulatedBy — Tue, 27 Jan 2026 22:47:29 +0000

Artificial intelligence has evolved beyond its origins as a specialized technical domain and is now experiencing widespread adoption across all business sectors—and Product Managers who speak its language are becoming translators driving this transformation.

They play a crucial role in navigating the obstacles encountered during AI development, bridging the gap between data science teams, engineering, business goals and the end client.
Now, let’s deep dive into the qualities that distinguish a mediocre AI Product Manager from a versatile AI Product Manager.

Understanding the “How” and “Why” is critical

They are familiar with core concepts of machine learning (i.e. feature engineering, feature validation, model training and validation, overfitting, and the differences between supervised Vs. unsupervised learning etc).

They are knowledgeable enough to challenge the assumptions and ask better questions rather than playing a mere messenger role between teams. Moreover, well-rounded PM challenges whether AI is even needed for a problem, questions data quality before building, rather than chasing the latest trends.

Moreover, they ask ‘what could go wrong?’ early in the process and build solutions, instead of waiting for things to break after launch.

They see data as the Core Asset

There’s an old saying: garbage in, garbage out. Your model is only as good as the data you feed it. This is what makes or breaks AI products.

The best product managers are fluent in the language of data. They have a sixth sense for biased data and know exactly what to ask when training data looks unrealistically perfect.

The great PMs don’t just accept data as-is; they interrogate it, and work hand-in-hand with data engineering teams to build the pipelines and frameworks that keep everything running efficiently and effectively.

The AI PMs who succeed? They’re not the ones who build coolest features. They’re the ones who obsess over data quality and make decisions based on evidence and not based on assumptions.

Strong Business Acumen

AI Product Managers need to ensure their AI projects support the company’s goals and align with financial objectives. They must thoroughly examine market opportunities, evaluate risks, and estimate potential ROI to effectively prioritize product features.
Good negotiation and planning skills help them deliver results without losing sight of the bigger picture. They know how to manage the model’s performance, latency, operating costs, and end-user experience without defaulting to “make it perfect.”

User-Centric Approach
The best AI Product Managers obsess over their users. They focus on what users really need, build responsibly, and stay close to user feedback so they create something that actually matters to people.

Great AI PMs don’t hide behind jargon. They communicate clearly with users and regulators, comply with data protection statutes, and always obtain required consent, which goes a long way toward building trust.

Creating responsible AI means collaborating with legal, compliance, and ethics teams right from day one. Good PMs don’t treat ethics like a checklist that they rush through; they make bias testing and regular reviews part of their everyday workflow.

This approach turns ethics from a reactive problem into something you manage upfront. This helps you build AI that people trust and that you can confidently defend.

Navigating Uncertainty

Unlike regular software that does exactly what you tell it to; AI brings uncertainty into the picture. Smart Product Managers don’t expect AI development to work like traditional software development. Definitely, it is an iterative process with lots of twists and turns.

Hence, AI product managers should demonstrate the expertise of adapting to inherent system uncertainties. They need to be equipped with robust set of skills to manage and navigate through all the essential components of AI product life cycle i.e. strategic planning for continuous monitoring, ongoing maintenance, and iterative model retraining.

Bridges silos between teams

Well-rounded AI product managers know how to pull together teams with different skills. Data scientists work on the AI models, engineers build systems that scale, designers focus on the user experience, and domain experts bring industry know-how.
Regular meetings and clear documentation help everyone stay aligned and avoid misunderstandings. Tools like structured project management boards and data dashboards support this coordination.

Further they bridge the gap between technical teams and business folks by explaining AI in terms everyone can understand. This reduces misunderstandings and accelerates decision-making.

Adapting to Emerging Technologies

Great AI product managers are always learning new things. They keep up with the latest AI trends and tools that help get work done faster; but only use these if they really help their business and customers.

They quickly test new ideas to see which features are truly helpful. This way, their products improve in smart ways instead of just copying the latest trends.

It is important to regularly read about new studies, attend industry events, and talk with AI developers. Doing these things helps you bring new and useful ideas into your product plans.

Old ways of managing products are becoming less useful. It is not AI that will take your job, but another product manager who uses AI better. The best AI product managers are good at thinking about products and see AI as just a tool. They know when to use AI and when it is better not to.

Text written by Shamindra Peiris, Senior Product Manager, Visa A2A Risk Solutions and a speaker during the upcoming Data Science Salon Austin Conference. Secure your spot today!

The post The Versatile AI Product Manager: Beyond the Hype, the Core Qualities for Success appeared first on ALD Blog.

Why Smaller AI Models Are Becoming a Strategic Advantage for Enterprises

FormulatedBy — Tue, 20 Jan 2026 17:09:58 +0000

Over the last two years, the artificial intelligence discussion has been preoccupied with scale. Larger models, more parameters and continually increasing infrastructure investment have been staged as the seemingly unavoidable future of enterprise AI.

But within actual organizations, the ones that are forced to work within a budget, subject to regulation, under security requirements, another picture is taking shape.

Large models are not necessarily good business choices.

In the financial services sector, healthcare, logistics, retail, and manufacturing small language models (SLMs) are gradually turning into the basis of production-scale AI systems. This change is not regarding the reduction of ambition. It is concerned with the alignment of AI with the way of the functioning of enterprises.

The Enterprise Reality Behind AI Hype

General intelligence is the best in large language models. They can think across the fields, produce innovative material and be able to react to indeterminate questions in a flexible manner. These features are spectacular- however; this is not what most businesses require in their day-to-day operation.

Enterprise AI use cases are typically narrow and repeatable:

Automating internal workflows
Interpreting policy, compliance, and operational documents
Supporting customer service and operations teams
Assisting IT, security, and DevOps functions

Generality in such situations turns out to be a liability. Large models present increased operating expenses, unpredictable latency and increased governance risk. Just a lot of organizations find themselves paying a price which is sophistication which they cannot safely roll out.

What Makes Small Language Models Different

Small language models are not defined by what they lack, but by what they prioritize.

SLMs are intentionally designed to be:

Domain-specific rather than universal
Task-bounded rather than open-ended
Optimized for inference efficiency and consistency
Easier to deploy within private, controlled environments

For enterprise leaders, this translates into systems that behave more like reliable infrastructure and less like experimental technology.

Why CIOs and CTOs Are Paying Attention

On the executive level, small language models are more in line with enterprise interests.

Predictable economics
SLMs use fewer tokens and consume less compute, allowing stable cost models of AI systems that need to execute on a continuous basis instead of an episodic one.

Security and compliance alignment
Smaller models are isolable, auditable and governable. They may be implemented on a small-scale network, by business unit, and based on current frameworks, i.e., SOC 2, ISO 27001, and NIST.

Performance that users notice
Latency and reliability are also important in production settings, where capability of theoretical models is not as important. SLMs are also often capable of providing quicker response times, enhancing acceptance and confidence.

Operational fit
SLMs fit better into CI/CD pipelines, MLOps platforms and enterprise observability tooling. They are simpler to version, monitor, and roll back, which are essential characteristics of production systems.

The Emerging Hybrid AI Architecture

The major companies are not making a decision between big and small models. They are combining them.

A common architectural pattern is taking shape:

One big model applied with selective use, e.g., to conduct some complicated reasoning or orchestration.
Several minor models that do execution, classification, validation and transformation.

This is reflected in the design of software systems by modern enterprises: a central control plane that coordinates specialized services. AI architectures are no exception to the same development.

Prebuilt or Custom? The Strategic Decision Leaders Must Make

With the use of small language models in the mainstream, an essential question arises in front of enterprises: Should they use prebuilt models or invest in their own?

The solution is not as much about technology as business purpose.

When Prebuilt Small Models Are the Right Choice

Ready-to-use SLMs are useful in all organizations that value speed and efficiency. They make sense when:

The application is typical and familiar.
Differentiation is not as important as time-to-market.
Configuration and isolation can be used to address security requirements.
There is a low internal AI capacity.

Preexisting models offer a rapid, low-noise outlet to operational AI to numerous enterprises, particularly in the context of internal productivity applications and non-differentiating workflows.

When Custom Models Become Strategic Assets

Custom SLMs have higher initial costs, but they are very attractive when the results of the core business are directly affected by AI.

The use of custom models is usually justified in situations when:

The field is very specific or proprietary.
The sovereignty or the sensitivity of the data is not negotiable.
There is a requirement of deterministic auditable behavior.
The output of AI impacts financial, legal, or operational choices.

In such a case, model ownership is a blessing rather than a curse. Custom SLMs provide a stronger level of control, more explicit accountability, and an enhanced level of alignment with enterprise risk management

The Cost Perspective Most Teams Miss

The dilemma of build versus buy is commonly developed based on short-term price. In practice, the greater is the total cost of ownership.

Ready-made models will minimize the upfront work but can cause future complications:

Ongoing usage fees
Vendor dependency
Poor tuning and flexibility in governance

Custom models are required to be invested in early and can provide:

Reduce inference long-term costs.
Full lifecycle control
Integration with internal platforms without any issues.

In long-run economics, intentional design is often better than convenience in the long term in AI systems that are supposed to run at scale.

From Experimentation to Infrastructure

Small language models have perhaps the largest organizational presence, not technical.

They enable AI to leave the world of innovation laboratories and find its way to core platforms. They ease the tension with security teams, make governance discussions simpler, and provide engineering leaders have much more straightforward ownership of outcomes.

To be brief, they make AI a working tool.

The Strategic Takeaway

The following stage of enterprise AI will not be characterized by the largest model deployer. It will be characterized by the ability of an individual to roll out AI systems that are secure, scalable, governable, and cost-effective.

It is not that small language models are becoming popular due to their reduced size, but rather, owing to their appropriateness to enterprise decision-making reality.

Practicability is eternity in business.

Author: Milankumar Rana

The post Why Smaller AI Models Are Becoming a Strategic Advantage for Enterprises appeared first on ALD Blog.

AI for Personalized Commerce Beyond Recommendations: Real-Time Intent Prediction

FormulatedBy — Tue, 13 Jan 2026 22:43:12 +0000

For years, the gold standard in retail personalization has been the recommendation engine. The recommendation engine functions as the primary retail personalization standard that many retailers have used for market personalization for multiple years.

These early digital commerce systems used two essential features that recommended products to customers based on their purchase history and shopping preferences. These systems demonstrated strong performance in summarizing historical product affinity data, which showed what products users had liked in the past. But today’s hyper-dynamic and fast-changing retail market requires new shopping systems because traditional collaborative filtering and matrix factorization methods operate as static reactive systems.

They operate in batch mode, updating perhaps daily or weekly, and excel at answering the question, “What do people like you usually buy?” The systems fail completely when they need to handle the current consumer behavior, which involves immediate requests for products – “What do you need right now? ” The user will perform three consecutive clicks, which will change their intended action entirely, and purely historical models become unable to detect this unpredictable sequence of events.

Retailers understand that product affinity optimization differs from the process of achieving customer purchase intentions. This critical shift is driving the retailers to look beyond simple recommendation engines and explore Real-Time Intent Predictions as their standard.

The Core Shift: From Preference to Prediction

Intent prediction is the practice of using AI to analyze a customer’s current session, blend it with their history, and forecast their next action—or inaction—within milliseconds. It is about moving from predicting preference to predicting intent.

Preference-Based Systems (Traditional)	Intent-Based Systems (Modern)
Question: What categories and products does the shopper usually like?	Question: What is the shopper actively trying to do in this session?
Data Source: Historical purchases, aggregated user similarities	Data Source: Real-time session behavior (micro-signals), current context
Action: Suggest products	Action: Orchestrate the entire experience

An intent prediction system uses its ability to read hidden signals to make exact predictions about what someone will do next. It generates instant customized answers that modify their content through real-time user interactions instead of depending on delayed data. Research indicates that intent-based interventions lead to a 10-25% boost in conversion rates.

Why Intent Beats Preference

Preference answers:

What categories and products does the shopper usually like?

Intent answers:

What is the shopper actively trying to do in this session?

The New Gold: Customer Micro-Signals

Real-time intent prediction depends on correct customer micro-signal interpretation, which serves as its fundamental base. These are the granular, transient behavioural customer data points, and these data points function as individual session events to show user motivation and sense of urgency that exceeds what basic click records and buying records (purchasing history) can show.

Taxonomy of Behavioral Signals:

Interaction Signals (On-Page Behavior):
- Dwell Time: How long users spend looking at product images compared to reading reviews. Customers who stay on review pages seem to assess the accuracy of the information and may be evaluating the product’s credibility on the listing.
- Scroll Depth: How far the user has scrolled on the page to reach the product information. For example, the user stops at the exact point where the essential technical information is located.
- Mouse Movement: The system tracks user’s mouse activities through their slow or fast movements, which could indicate their level of frustration. Research shows that when users hover their mouse over product attributes, they tend to stay longer, which directly affects their decision to make a purchase (conversion intent).
Navigation and Sequence Signals:
- Sequence: The user follows this sequence by viewing Product A, then adding Product B to their shopping cart, and finally searching for Product A accessories.
- Rapid Switching: The system enables users to perform fast product tab switching, product filtering and removal. These patterns often reveal comparison tasks, narrowing intent, or uncertainty.
- Hesitation Loops: Adding items to the cart and then removing them and adding them again, or viewing their cart without checking out, show obvious signs of shopping process difficulties or friction. Research models show that organizations can predict customer abandonment through hesitation loops, which provide better than 25% improved accuracy compared to previous methods.
Contextual Signals:
- Device Type: Switching from desktop to mobile may indicate purchase readiness and checkout enablement.
- Location or Time of the day: Users who use the internet at night mainly want to obtain information and do research instead of making rush/impulse purchases.
- Weather or Regional Conditions: These factors/signals can influence the product catalog category interest and can shape the customer intent.

It’s not so difficult to get these signals from the user while they browse, and while we gain the richness from these signals, they can often become the noise for the new users (random mouse jitters, random clicks), which might require the system to handle these signals with sophisticated filtering.

The Technical Engine: Vectors, Sequences, and Speed

The processing of these complex sequences requires modern Machine Learning methods, which operate continuously while providing quick response times.

Embeddings and Vector Search: The Semantic Leap

Modern intent systems operate by analyzing complete user interactions, which consist of both search terms and all user interface clicks. The system converts all user interactions together with product characteristics and session data points into numerical embeddings, which exist as vectors within a high-dimensional space. The embedding system functions as a coordinate system that groups similar behaviors and intents into neighboring clusters.

The key innovation is representing both customer behavior and the product catalog in the same embedding space. The shopper’s current shopping activities function as personal behavioral indicators that help identify them uniquely. Vector search then allows the system to instantly find other users or products whose current session vectors are “closest” to the current user’s vector in this space. The system employs a similarity matching technique to identify intricate analogies, which allow it to recognize intricate connections between various system components. The system identifies when the user’s current intent vector shows 95% similarity to that of users who typically abandon their carts but subsequently receive free-shipping banners, which help them complete their purchases.

The system performs this computation through high-performance vector databases, which operate at scale using Pinecone or Milvus. The system uses Vector search technology to decrease millions of possible product matches into twenty relevant options, which it can process within milliseconds for personalization.

Stream-Based Architecture

The architecture that enables real-time intent prediction requires continuous data flow at low latency rates instead of using batch processing methods.

Event Streams: The system records all user interactions(clicks, scrolls, hovers, searches) through Event Streams, which operate as high-volume continuous data streams that use Kafka or Kinesis platforms. This is often the source of the micro-signals.
Sequence Models: The production systems at present operate with Transformer-style architectures, which implement SASRec and BERT4Rec concepts for their operation. The models process session event sequences by time to discover intricate relationships between actions that affect the likelihood of subsequent actions for intent embedding generation.
Real-Time Inference: The output embeddings are stored in a vector database index, which enables real-time inference operations. The system uses the current session embedding to query the high-speed service, which generates predictions (e.g., abandonment risk score and next-best-product) within 50 milliseconds.
Personalization Orchestration Layer: This is the critical layer that takes the machine learning predictions (the what) to establish the specific methods (the how) and locations (the where) for intervention. This allows users to convert model output into coherent customer experiences, decoupling the ML Model from front-end rendering operations.

The Hard Truth: Challenges and Limitations

Organizations that need to transform their entire technical system and cultural environment to achieve real-time intent prediction should also know that implementing true real-time intent predictions requires substantial engineering and cultural shifts, and it is not a plug-and-play solution.

Infrastructure and Cost

The process of handling high-volume event streams together with training big sequence models and operating low-latency vector databases needs dedicated cloud infrastructure that is designed for and can perform such operations at scale. The Total Cost of Ownership (TCO) is often high to implement such an architecture. The process of real-time vector search, which handles millions of items while processing millions of queries per second, requires substantial computational power. As a result, many retailers start by applying intent prediction only to their top 5–10% of traffic to control spending.

Data Sparsity and Noise

Real-time processing creates specific difficulties when working with data. The noisy nature of micro-signals demands strong filtering techniques, which ML anomaly detection methods must use to function properly. Real-time systems experience growing operational difficulties because new users and the latest products show performance degradation when they do not have enough available data. To address such issues, the system requires hybrid fallback techniques that use popularity and content-based scoring to reduce personalization until enough behavioral data becomes available.

Privacy, Governance, and Ethics

Organizations need to establish strict ethical and governance systems that will control their access to detailed real-time behavioral information. Micro-signals can feel invasive. Retailers need to show clear information and must be transparent about their data handling practices, which should provide useful benefits to customers instead of monitoring their activities. Needless to say, the organizations should apply appropriate data privacy and protection laws in the countries where they operate, such as GDPR and CCPA. The implementation of over-personalization through customer data analysis becomes unappealing to customers when they experience it as an invasion of their privacy.

Evaluation Complexity

It’s not so easy to evaluate the performance of an intent system, while testing a simple product recommendation could be done easily by performing A/B testing, but to test the intent systems, which by design are dynamic and context-aware, is much more complex and challenging. Click-through rates do not provide enough information to determine the success of an intent system. The process of precise measurement requires advanced methods which combine counterfactual testing with multi-armed bandits and continuous monitoring of customer retention metrics and lifetime value data.

Organizational Alignment:

The transition to intent-based systems will require support from all teams, including data science, engineering, product, and marketing. The process of intent prediction creates new challenges for team organization because it demands that organizations adopt a permanent experimental approach that uses data for decision-making.

Future Outlook: The Autonomous Commerce Experience

The current system capabilities merely serve as our base to develop an adaptive commerce platform. The current state of AI technology development enables the development of systems which will achieve higher levels of integration and autonomous operation.

LLM-Assisted Session Understanding

Large Language Models (LLMs) will create major improvements for intent systems due to their inherent capabilities. With the integration of LLMs, the system can produce detailed high-quality embeddings derived from both user clicks and unorganized data sources, including search queries, customer service dialogues, and product reviews. The system will achieve a better understanding of human intentions through its ability to process unstructured data. LLMs can interpret user actions within their original environment to detect that users who follow “search → filter → refine query” are seeking more exploration, but users who perform “search → immediate click” have already established their exploration parameters. LLMs can also transform behavioral patterns/information into readable explanations (natural language explanations) (e.g., when the shopper is comparing two products to make sure they fit as per the sizing requirements ), which enable human operators to detect system problems and improve their debugging abilities.

Autonomous and Generative Personalization:

The future involves Autonomous Personalization Engines. The systems will advance to not only predict but also perform autonomous mini-experiments (A/B/n tests) on the fly to determine the best experience parameters (e.g., display size, offer placement, and content) for achieving the highest predicted results without requiring human involvement. These systems will generate customized banners and content blocks and enable discovery flows through Generative AI, all operating in real time.

Predictive Experience Orchestration:

The systems will advance beyond their current optimization of individual interactions to develop connected (multi-channel) experiences that unite various communication pathways. The system will trigger an in-app notification, which will display the later that day when it detects weak website engagement, or it will send a notification to store staff when the user reaches their store location. Businesses will need to create product detail pages in advance, update their image displays, and restrict inventory stock based on their projected needs to achieve better operational efficiency. Intent prediction will become a control system for the entire experience not just a personalization feature for the customer journey.

Key Takeaways for Retail Practitioners

The era of guessing what customers might like is over. The competitive advantage now lies in knowing what they’re about to do before they do it.

Shift the Mindset: The current standard of static recommendation needs to transform into real-time intent modeling, which uses micro-signals to achieve improved results.
Focus on Data Streams: The system requires stream-based architectures, which include Kafka and sequence models and vector search to achieve sub-50ms inference.
Invest in Embeddings: The system requires investment in Embeddings because these technology components operate as the core system, which delivers fast and accurate behavioral product matches between millions of products.
Prioritize Orchestration: This is the crucial layer, where the system needs to be managed through orchestration to convert ML prediction output into customer experience solutions, which include abandonment offers and urgency messages and personalized user interface modifications.
Balance Innovation with Ethics: Organizations need to defend privacy rights while they maintain openness to information, but they must stop using aggressive methods to achieve over-personalization goals.
Measure Incrementally: The evaluation of success needs to happen through A/B testing methods, which should be combined with long-term metrics such as Lifetime Value instead of using short-term click-through rates for assessment.

Real-time intent prediction serves as a perfectly tuned individual shopper GPS system, which provides exact route guidance to their checkout destination by predicting all upcoming obstacles and turns in the most efficient way possible.

True competitive advantage lies in leveraging customer micro-signals and vector search to achieve real-time intent predictions and using them to orchestrate the future of autonomous commerce for a better customer experience.

Author: Karan Kumar Ratra

“This document reflects the views of the individual author(s) in their personal capacity and not as a representative of their employer(s). They do not reflect the views of their employer(s) and are not endorsed by their employer(s).”

The post AI for Personalized Commerce Beyond Recommendations: Real-Time Intent Prediction appeared first on ALD Blog.

How Data Teams Really Use a Data Catalog: A Practical Journey to Self-Serve Analytics

FormulatedBy — Tue, 30 Dec 2025 17:02:34 +0000

When I joined a B2B SaaS startup in 2021, I was transitioning from a bigger and more mature company. In my previous company, I was working on a particular area of the product – while it was rigorous and intense – I was excited about the challenge of taking on a bigger role with an opportunity to lead data for sales, marketing, customer success, product, and even people in the new company.

However, what I did not anticipate was the lack of resources when you join a startup. We were a team of five people (in the first six months of me joining). A small, lean team trying to explore data, build dashboards, and derive insights to set the direction of the product at the same time. I have always taken pride in building a partner team rather than a support team, but with so much unknown and so much discovery to be done with the data, with people coming in and out, it became difficult to understand the actual source of data.

The more time my team and I spent on data discovery, the more we realized that we weren’t spending time on prescriptive and predictive analytics that we should have been doing, and we were just stuck in the descriptive loop. I realized hiring more people was not the solution, as they would join the chaos without working on a scalable solution. I realized we were asking the same repeated questions about data definitions and data sources again and again, and upon research, we came to the conclusion that we were missing a data catalog tool. I had used Collibra in the past, but it was a heavy manual lift and did not fit our needs at the time. So, we started looking for a modern solution – a modern data catalog.

Modern Tools, but Old Problem

If you walk into any modern data company, you will see a data ingestion tool that ingests the data to your warehouse, some BI tool that maintains multiple approved and exploratory dashboards, maybe a reverse ETL tool that sends the data back to your CRM tool. While modern solutions exist, and each of these tools does a wonderful job, the complexity still remains high, especially if you are trying to tie everything together.

Where the data is coming from and how it is manipulated remains a question mark that gives data folks sleepless nights. Data dictionary, definitions, and lineage remain trapped inside YAML files and Git pull requests. While data engineers understand this, most of the knowledge lives in their heads, or the documentation is too complex for the rest of the stakeholders, and sometimes the stakeholders are just not motivated enough to go through the documentation to understand the details.

Choosing the Right Tool

There were multiple options available for a data cataloging tool, and we had the following criteria to select the tool. Although tools market a number of fancy features, we just needed some basic features that “actually” resolved our problem.

Automatic integration with dbt and the warehouse

Accurate column-level lineage

Searching capability, be it at a column level or table level

Useful metadata

There were multiple tools that fit the bill to a certain extent. But we wanted to select a tool that sits on top of our warehouse, is smart enough to automatically understand the lineage, and not only helps us understand data from the source but also helps us understand which dashboards are consuming these fields, so a stakeholder consuming the dashboard knows the business logic behind the field they are referring to.

Cultural shift

We did not want it to be a tool that was just another modern solution, but one no one adopts. So, we enforced everyone on the team to use it. And it was an instant hit. In our internal team slack channels, when someone asks a question about data origin, business definition, etc., be it an analyst or a data engineer. Instead of answering the question directly, we would send a link and let them discover if the tool makes it clear enough. If there is any information missing regarding the business logic, we would update the logic.

We didn’t realize that while we were working on it, we were actually building a solid business glossary. Once we became confident that the tool was working as expected and questions were being easily answered, we started exposing the tool to other stakeholders and sharing the link to the tool in our slack channels. This allowed us to repeat the same practice we started in our team, externally as well. This again was an instant hit. Some PMs started updating the definitions and business logic and even offering to collaborate to refine the business glossary. We also started embedding the Select Star links in our dashboards, making a full circle when it comes to our ecosystem.

Gradually, repeated questions coming to our team have declined, and the focus has started shifting towards interpreting the data rather than extracting the correct data. Overall, the confidence in the back-end data and the confidence in our team got a significant boost.

What we learned

If you consider a catalog as a governance tool, it may not work as a company-wide solution. However, if you consider it as a productivity tool, it may work wonders for your team as it did for me.

Second, we were patient with it but very aggressive. We made it mandatory for internal teams to use it, even though it was a bit of a habit change and we had some initial friction. We were easily able to incorporate it in our workflow and improve the quality of our lives. We were able to focus on things that matter for modern data teams.

Third, we don’t need every stakeholder to adopt your proposal. We just need a few champions that will take the tool forward. Lastly, trust is a big factor when it comes to a data team. Going from a team that is under a lot of pressure to a team that is setting the direction of the product requires a lot of effort and, in our case, some diligence and the selection of the right tool helped us.

Final thought

There are many data catalog tools that exist, and AI has given more wings to these tools. Modern catalog tools are lightweight, easy to use, and seamlessly integrate with your data ecosystem. Documentation is a big challenge for data teams, and not something that most teams enjoy. Neither do the stakeholders, who complain about lacking documentation but are sometimes too lazy to read through provided documentation.

So, data catalogs are a solution that can ease your life, automate finding answers, help you build a business glossary, and at the same time help mature your data teams from being descriptive, reactive teams to proactive teams working on diagnostic, prescriptive, and predictive analysis that sets the direction for the stakeholders.

Author: Snehal Karanjkar

The post How Data Teams Really Use a Data Catalog: A Practical Journey to Self-Serve Analytics appeared first on ALD Blog.

Integrating Direct Mail with Digital Channels Using AIH

FormulatedBy — Tue, 16 Dec 2025 17:17:03 +0000

For many years, marketers have questioned whether direct mail still deserves a place in a world shaped by digital communication. Email, social platforms, apps, and real-time analytics have changed how customers interact with brands, and these channels often overshadow traditional offline methods.

Yet direct mail has slowly regained importance, not because of nostalgia, but because it fits naturally into a modern hybrid marketing environment. People now move between digital and physical touchpoints without thinking about it, and marketing strategies need to keep pace. AI is making that possible by giving direct mail new levels of personalization, optimization, and measurability. These advances strengthen the qualities that have always made physical mail effective, such as its sense of trust, the quieter environment in which it is received, and its appeal across generations.

To see how this works in practice, the luxury travel sector offers a clear example. In particular, long-duration cruise travel reveals why physical mail combined with digital strategy can outperform digital channels alone.

Case Study: AI-Enhanced Direct Mail for a Luxury Cruise Line

A luxury cruise company that specializes in trips lasting between seven and twenty days was struggling to reach the audience most likely to book. The brand relied heavily on digital campaigns, yet digital channels were not bringing in the travelers who had both the time and the financial freedom to commit to extended voyages.

These ideal customers were generally between the ages of 55 and 75. Many were retired, many were homeowners with significant equity, and most preferred printed materials over online ads. They also used ad blockers frequently, checked social media less often than younger audiences, and tended to ignore promotional emails.

Although younger consumers clicked on digital ads, they rarely became buyers. Work schedules, children at home, and limited discretionary income made long cruise vacations unrealistic for them. This mismatch led the company to rethink how it identified and reached qualified prospects.

Identifying the Right Travelers With AI

The company introduced an AI-driven system that analyzed multiple data sources to predict which households were most likely to travel. The model used demographic and lifestyle indicators such as age, household income, number of children, homeownership signals, and evidence of home equity borrowing. It also incorporated digital behavior that hinted at travel interest, such as browsing foreign destinations, reading retirement planning content, and searching for long-stay vacation packages.

One key insight from the model was that middle-income families with children at home were significantly less likely to book long cruises. This allowed the company to narrow its audience and avoid mailing expensive catalogs to low-probability households.

Creating Personalized Direct Mail

With a refined audience in place, AI helped produce printed materials tailored to each recipient. Catalogs highlighted destinations that aligned with browsing activity. Recommendations for cabin types and price ranges matched predicted income levels. The content felt personal rather than generic, which helped build trust and curiosity.

Every mailer included a QR code and a personalized web link. When scanned or clicked, both led customers to a matching online experience that displayed the same itineraries and options they had seen in print.

Trigger-Based Mailing for High Intent Moments

AI also monitored online signals and sent physical brochures when interest peaked. If someone viewed a cruise itinerary, hesitated on a booking page, or returned to the site after a long absence, the system triggered a print piece to be mailed within a short window of time. This created a sense of timely relevance. The brochure often arrived while the traveler was still thinking about the trip.

Combining Direct Mail With Digital Follow-Ups

The greatest success came from using direct mail together with digital reinforcement. Once a catalog arrived, the company sent follow-up emails, retargeting ads, and reminders that aligned with the content of the printed piece. Past travelers responded especially well to this combination because the physical catalog revived memories of previous cruises. Interested prospects also reacted more strongly when digital messages followed a printed brochure that had already caught their attention.

This sequence brought measurable improvements. Website visits increased, QR scans rose significantly, and bookings happened more quickly. Many customers also upgraded cabins or added excursions, partly because the printed catalog made them more confident in the value of the trip.

Results

The new system reduced wasted mailings by 34 percent. People who received personalized catalogs were four times more likely to book than those targeted by digital advertising alone. They were also more likely to choose longer itineraries or higher cabin categories. The cruise line finally reached the older, affluent travelers who had been nearly invisible to digital platforms. The hybrid approach created stronger engagement, better recall, and higher revenue per customer.

Conclusion

Direct mail is not an outdated tactic. When supported by AI and woven into a digital strategy, it becomes one of the strongest tools in modern marketing. AI brings predictive modeling, versioning, real-time triggers, and accurate attribution to physical mail. These capabilities allow brands to benefit from the trust and attention that printed communication naturally creates while still enjoying the speed and precision of digital channels.

Author: Himanshu Kumar

The post Integrating Direct Mail with Digital Channels Using AIH appeared first on ALD Blog.

Mem0, Zep, or Build Your Own: Which Memory Management Framework Should You Use for Enterprise Chat?

FormulatedBy — Tue, 09 Dec 2025 22:26:56 +0000

Memory is what makes or breaks enterprise chat. Using your AI assistant will feel like a toy, not a production system, if it forgets who the user is, loses context between tickets, or asks the same clarifying questions every time. Naive methods that transmit the whole conversation history into every prompt, on the other hand, raise token prices and delay.

This is when dedicated memory frameworks come in.

In this post, I’ll show you a useful way to make decisions about corporate memory, compare Mem0 with Zep, and talk about when it makes sense to develop your own memory layer. I will talk about cost, architecture, security and compliance, and realistic timetables for rolling out an enterprise.

What “memory management” truly implies for business discussion

A memory system for a typical business chat assistant or agent needs to do more than just “store previous messages in a database.”

You normally have to:

1. Keep user-specific context across sessions, such as preferences, past decisions, account details, limitations, and escalation history. Kept for weeks or months, not just one chat.

2. Get the important things, not everything. Pull out the most important details from vast chat histories. Get the proper snippets at inference time to keep the model’s window in the right place and keep token costs down.

3. Link chat to business information like orders, tickets, CRM records, knowledge articles, IoT data, or transaction data. Often needs more than just ordinary embeddings; it needs graph-style relationships and temporal reasoning.

4. Follow the rules for security and compliance, such as data residency, encryption, PII masking, RBAC or ABAC access, and audit trails. Work with your current IAM and network limits.

5. Work on a large scale with tens of thousands or hundreds of thousands of users. Thousands of communications every month, with costs and performance that can be predicted.

Most teams end up in the same place: you need a separate “memory layer” between your chat front end and your LLM provider.

Option 1: Using Mem0

Mem0 is a controlled and open-source “universal memory layer” for LLM apps that takes important information from chats and other sources and stores it to use for long-term memory and customisation. You can get it as:

• You can host it yourself using the Apache 2.0 open source stack. (docs.mem0.ai) • Managed the Mem0 Platform, which charges based on usage and takes care of infrastructure and operations for you. (docs.mem0.ai)

Mem0 is used in a lot of agent frameworks and integrations, like Microsoft Autogen. It focuses on memory extraction, consolidation, and retrieval. (Microsoft GitHub)

Good things

• An OSS project that is grown up and busy: A large and expanding community, regular commits, and a lot of stars and forks show that people are using it. (GitHub) • Make the open source vs. platform story clear: The OSS version is free to use and is based on Apache 2.0. The platform includes tools for hosting, managing, and running a business. The documents clearly compare the costs of infrastructure between OSS and platform. (docs.mem0.ai)

• Made to save tokens and time: Mem0 only pulls out the important data instead of playing back the whole history, which can save a lot of tokens. (arXiv) • Works well for “personalized assistant” situations: Customer support bots, internal copilots, and multi-session assistants that consider user traits and preferences. (GitHub)

Weaknesses and trade-offs

• Not as set in their ways about graph and corporate data modeling: You can connect Mem0 to RAG or your own graph, but it is not a complete graph knowledge platform on its own. You will need to connect other systems if you want to run rich graph queries across business entities.

• You still own the infrastructure for OSS. When you host your own site, you pay for the vector database, LLM calls, and hosting. (docs.mem0.ai) • Choosing between a vendor platform and doing it yourself: If you choose the managed platform, you will have to make sure it fits with your data residency, DPA, and security needs.

Where Mem0 usually works best

• You want to get value quickly and don’t want to build a graph stack that is too complicated on the first day.

• Your main goals are to personalize several sessions, lower costs, and improve memory in chat.

• You can either host your own Apache 2.0 stack or sign up for a SaaS memory platform that is made just for you.

Option 2: Using Zep

Zep is a “context engineering platform” that builds on a temporal knowledge graph by adding agent memory, Graph RAG, and context assembly. (getzep.com)

Graphiti is a Python framework that lets you design temporally aware knowledge graphs that show how entities, events, and relationships change over time. (help.getzep.com)

Zep began with an open-source community edition of Apache 2.0 and now mostly works on a managed enterprise platform. The OSS repo is still available, but it is not being actively updated anymore. (Reddit)

Good points

• Good at reasoning with graphs and time

Zep models user memories and business data as a temporal knowledge graph and has published results suggesting that it is more accurate and has less latency than baseline methods on long-term memory benchmarks. (arXiv)

• Features and certifications that are good for businesses

The business platform has SOC 2 Type II, DPAs for EU customers, HIPAA BAA on enterprise plans, and a number of deployment choices, such as managed, BYOK, and others. (getzep.com)

• Good connections to other parts of the ecosystem

It’s easy to add agent memory to existing stacks thanks to integrations with LangChain and other tools. (LangChain Docs)

• Integrations with cloud providers

For instance, Zep can use Amazon Neptune and OpenSearch as storage for graph and text search to help it remember enterprise data for a long time. (Amazon Web Services, Inc.)

Weaknesses and trade-offs

• The open-source tale is now “static.”

The open-source community edition is Apache 2.0, however it is no longer being worked on. That means that your team may have to do more work to keep Zep running if you host it yourself for a long time.

• More complicated than you might need for simple assistance. The complete graph centric context engineering stack can be too much if all you need are memories of user preferences and short conversations.

• Focus on business: It’s evident that the documentation and support are better for the SaaS product. If your business simply hosts itself, think about this carefully.

Where Zep typically works best

• You care about having strong relationships and being able to reason about time across chat and corporate data.

• You want a managed, enterprise-level platform with SLAs and certifications. • You plan to make several agents that all use the same knowledge graph.

Option 3: Making your own memory layer

The third choice is to make your own memory system, which you can do by combining:

• An embedding vector database.

• A relational or document database for facts that are organized.

• Application logic for getting, summarizing, scoring, and getting back. • A layer of rules for handling PII, tenancy, and RBAC.

Why teams think about this

• You want to oversee all the data, logic, and deployment.

• You already have a lot of experience with ML and platform engineering in-house. • You want memory to work well with current data platforms and internal standards.

Pros

• Full control over data and architecture

You can change how memories are taken out, combined, versioned, and erased. You can use the backups, observability, and disaster recovery plans you already have.

• Works with the internal tech stack

Instead of getting a new vendor, you can build on top of certified databases, message buses, and security measures.

• No surprises with features

Your roadmap is for you alone. No need to rely on decisions made by other products.

Drawbacks

• A lot of money is spent on engineering

To get Mem0 or Zep to work like they do, you need to add: Entity and fact extraction, deduplication, and scoring. Reasoning about time and across sessions. Isolation between tenants, retention procedures, and legal holds.Tools for debugging memory for administrators and observers.

• Cost of ongoing maintenance

You will always own schema migrations, scaling, performance optimization, and security patches.

• It’s harder to compare to the best of the best: Vendors are continually putting out studies and benchmarks on the cost and performance of long-term memory. It is not easy to do this internally.

“Build your own” only makes sense for most businesses if they are making a strategic internal platform and not just one chatbot.

Comparing costs

Things to think about when it comes to cost: not just the price of the license and SaaS When you compare Mem0, Zep, or a custom build, think about the cost in three ways:

1. Cost of the platform and infrastructure

a. Mem0 OSS

i. License: Apache 2.0, no cost.

ii. You pay for hosting, vector DB, and LLM calls. (docs.mem0.ai)

b. Mem0 Platform

i. This is a usage-based SaaS pricing model that includes infrastructure in the platform. (docs.mem0.ai)

c. Zep SaaS

i. There is a free tier, a credit-based Flex plan, and corporate options. Managed deployment, BYOK, and choices for businesses. (getzep.com) d. Build your own

i. You pay for your own computing, storage, networking, monitoring, and backup.

ii. If you use business vector DBs or graph databases, you may need to get more licenses.

2. Cost of tokens and computing at inference

a. Better memory systems send only the necessary context instead of whole chat logs, which lowers the number of tokens needed for each request. Both Mem0 and Zep stress latency and token savings in their message and research findings.

b. Custom systems can do the same thing, but only if you spend money on strong logic for summarizing and retrieving information.

3. Cost of engineering and running the business

a. Putting together an off-the-shelf memory platform can take weeks of work. b. Building a strong custom layer can take anything from a few months to a few years, including continuous support.

A solid rule of thumb is to use vendor platforms when you want to get the most value for your time and trustworthiness.

• Use self-hosted OSS if you want to save money on SaaS and have control, but are okay with owning the infrastructure.

• Only build your own when memory is a key feature of your internal platform.

Timelines: How long does each path normally take?

The actual timescales will depend on your company, but this is a reasonable plan for rolling out an enterprise chat.

Using Mem0 or Zep

• Week 1 to 2: Proof of concept

o Connect the chat app to Mem0 or Zep memory APIs so that it can be used as a single assistant.

o Keep user profiles and basic preferences between sessions.

o Check the effect on latency and token reduction.

• Week 3 to 6: Try it out in one area

o Include more structured entities and business information.

o Set up basic guardrails and monitoring.

o Check with internal users or just one business unit.

• From the third to the sixth month, production will grow.

o Make security, SSO, and RBAC stronger.

o Set rules for how long data should be kept and when it should be deleted. o Spread memory across several helpers and business lines.

Making your own memory layer

• Month 1 to 2: Alpha and architecture

o Choose databases, establish schemas, and set up memory abstractions. o Set up basic extraction and retrieval for one case.

• From months 3 to 5: Beta and integrations

o Include temporal, deduplication, and summarization elements.

o Work with IAM, observability, and at least one LLM app.

• Month 6 and after: getting stronger and changing the time

o Make sure that performance, cost, and accuracy are as good as they can be. o Make tools for administrators, audit views, and governance workflows. o Think of the memory layer as a product with its own backlog.

A vendor-backed memory framework is usually the best way to get results this quarter for your business.

A brief list of things to think about before choosing

You can use these questions to get your internal architectural conversation going:

1. Is long-term memory a strategic platform capability or just a feature?

a. If your assistants have this functionality, choose Mem0 or Zep.

b. If you’re making a “agent platform” for the full company, a custom build might be worth it.

2. Do you need to be able to reason at the graph level and in other domains right now?

a. A lot of graph and temporal reasoning across chat and business data points to Zep or a custom graph solution.

b. If not, it could be easier to use Mem0’s simplified mental model.

3. What are the limits on your compliance and deployment?

a. If you can’t utilize SaaS at all, Mem0 OSS is better than Zep OSS, which is basically not being updated.

b. If DPAs, SOC 2, and BYOK are allowed with SaaS, then both Mem0 Platform and Zep Enterprise are possible choices.

4. What skills do you have inside your company?

a. A bespoke memory layer may be handled by strong ML and distributed systems teams.

b. Don’t underestimate how much work your team must do if they are already busy.

5. What is your budget for tokens and latency?

a. Validate your choice with real workloads and compare token savings and latency side by side.

Final thoughts

Memory is no longer just a “nice to have” feature for business discussion. It is the most important part of an assistant’s experience that is reliable, personable, and affordable.

• Mem0 is a wonderful solution if you want an open-source memory layer that is versatile, actively maintained, and has an optional managed platform that focuses on personalization, cost reduction, and ease of use.

• Zep is the best choice when you need a complex temporal knowledge graph and a managed enterprise platform that makes context engineering a top priority. • Building your own is powerful but costly, so only enterprises that really want to own memory as a platform capability should do it.

Author: Milankumar Rana

The post Mem0, Zep, or Build Your Own: Which Memory Management Framework Should You Use for Enterprise Chat? appeared first on ALD Blog.