Cloud Cost Doesn’t Have to Be Chaos: Here’s How to Fix It

invisiblcloud — Tue, 26 Aug 2025 07:08:59 +0000

Before You Burn Another Dollar in the Cloud

Cloud hosting is a cornerstone of modern digital transformation but without disciplined cost governance, it can become a silent drain on innovation.

In 2025, global cloud spending is projected to reach $723.4 billion, up from $595.7 billion in 2024 (Gartner) By 2027, 90% of organizations are expected to adopt hybrid cloud strategies, pushing consumption past $1.35 trillion (IDC).

Despite heavy investments in cloud-native architectures, automation, and DevOps, many organizations still struggle to realize meaningful savings.

Cloud cost doesn’t have to be a runaway train. With the right approach, it can be proactively governed, intelligently automated, and architected for minimal waste without compromising developer experience.

Over the years, we’ve worked with startups, fintechs, healthcare platforms, and global enterprises facing ballooning cloud costs. A common pattern – They were paying for what they provisioned, not what they actually used.

This guide distills our proven strategies across GCP and AWS, featuring real-world transformations where companies achieved up to 70% reduction in monthly cloud spend without sacrificing development velocity.

Whether you’re a startup stretching every dollar or an enterprise looking to optimize Dev/Test workloads, this blog is your actionable playbook for cloud cost control.

Why Cloud Bills Spiral Out of Control

Cloud platforms promise elasticity and cost-efficiency but without disciplined governance, costs can escalate quickly and silently.

In our experience across various industries, uncontrolled cloud spend is rarely due to a single misstep. It’s typically the result of multiple small inefficiencies compounding over time.

The major contributors include:

Always-on environments (like Dev, QA, staging) left running beyond business hours
Over-provisioned compute resources, sized for worst-case scenarios but rarely utilized
Idle managed services (databases, caches, queues) that continue incurring costs even when unused
Accumulating logs, metrics, and artifacts in high-cost storage without lifecycle policies
Forgotten network components such as static IPs, DNS entries, and unused load balancers
Ephemeral environments with no TTL, often spun up for previews or demos but never torn down
Inefficient autoscaling configurations, scaling up too quickly and rarely scaling down
Snapshots and backups retained far longer than needed due to lack of automated expiration
Lack of tagging or ownership metadata, making it hard to assign accountability or clean up unused resources
Monitoring without action budget alerts that don’t trigger automated responses or remediation

In nearly every cost audit, we’ve seen that most savings come not from negotiating better pricing, but from stopping waste. Let’s explore the principles and tactics that enable this.

Core Principles for Cost Efficiency

Cost-efficient cloud infrastructure isn’t achieved through one-time actions, it’s built on disciplined, repeatable practices. Here are the foundational principles we’ve seen consistently drive long-term savings without compromising developer agility:

1. Ephemerality by Default

Infrastructure should exist only when needed. Preview environments, QA clusters, or demo stacks should have TTLs or auto-destruction mechanisms to avoid lingering waste.

2. Automation Over Intuition

Manual governance doesn’t scale. Use schedules, triggers, lifecycle policies, and GitOps workflows to manage resource uptime, scale, and shutdown without human intervention.

3.Comprehensive Tagging & Ownership

Every resource should be tagged with its environment, owner, purpose, and expiry. This enables cost attribution, automation, and accountability across teams.

4.Budget-Driven Enforcement

Budget alerts shouldn’t just notify they should trigger real actions. Integrate cost thresholds with auto-pause, scale-down, or alert routing via workflows or chatbots.

5.Serverless and Spot-First Thinking

Default to usage-based, interruptible, or serverless compute where possible. Spot VMs, FaaS, and managed runtimes offer massive cost advantages when reliability permits.

6. Lifecycle Governance for All Resources

Snapshots, disks, databases, and containers should have defined lifespans or expiry policies. Nothing should live forever by default if it’s not scheduled to end, it’s designed to be forgotten.

7. Cost Visibility as a First-Class Metric

Track cost trends like you track latency or errors. Annotate dashboards with budget context and make cost part of engineering retrospectives and SLOs.

8. Composable, Modular Infrastructure

Break monolithic stacks into modular units (via Terraform modules, Helm charts, etc.) to make it easier to tear down or right-size individual components.

9. Pre-Deployment Cost Awareness

Surface estimated cost at deploy time using tools like infracost or internal scripts. Developers shouldn’t have to wait for the invoice to know the impact of their infrastructure.

10. Self-Service with Guardrails

Empower teams to provision infrastructure but with built-in constraints like TTLs, quotas, or sandbox environments. Autonomy with boundaries prevents uncontrolled sprawl.

Proven Strategies from the Field

These strategies have been successfully implemented across cloud environments, enabling teams to reduce cloud costs by 50–80% without sacrificing agility or developer velocity.

Nightly Shutdown & Morning Auto-Scale

Non-production environments (Dev, QA, UAT) often stay active well beyond working hours. To avoid waste:

Tag eligible resources with keys like auto-shutdown=true
Schedule shutdown and scale-up jobs via automation platforms (e.g., job schedulers, functions, or cloud-native tools)
Store original configuration (e.g., node counts, disk settings) in a config store for seamless resume.
Resume infra during business hours via startup triggers or workflow

Result: Reduces runtime by ~60% for environments used only 8–10 hours/day.

Ephemeral Infrastructure on Demand

Spin up short-lived environments for specific use cases, and ensure they’re torn down automatically:

Use CI/CD to deploy on-demand environments (e.g., feature previews, demo stacks, bug repro)
Apply TTL (time-to-live) via tags, metadata, or external controller logic
Automate teardown after a defined period using cleanup jobs, workflows, or TTL controllers

Use Cases: Temporary QA, per-PR preview stacks, customer onboarding environments

Table 1: How TTL Works in Practice

Spot/Preemptible Compute for Batch Jobs

Move interruptible workloads like ETL, ML training, or simulations to lower-cost compute instances:

Use Spot/Preemptible instances with retry logic in orchestration frameworks (Airflow, Step Functions, etc.)
Use taints/tolerations or affinity rules to isolate these workloads in Kubernetes
Track job failure rates and segment critical vs non-critical workloads

Typical savings: 70–90% on batch and async compute pipelines.

GitOps-Driven Preview Environments

Integrate GitOps to create automated, lifecycle-managed environments:

Use Git events (PRs, commits) to spin up isolated namespaces or clusters via ArgoCD, Flux, etc.
Embed TTL annotations or cleanup logic in your GitOps repo
Decommission infra automatically after the TTL or merge/close even

Replaces long-lived shared staging with short-lived, clean-slate environments

Tiered & Expiring Storage

Optimize data storage costs by tiering and expiring unused content:

Apply lifecycle rules to logs, backups, artifacts, and datasets
Auto-transition to cold storage after defined aging (e.g., 7/30/90 days)
Auto-delete debug or staging data after TTL expiry

Example: Move logs from active to archival tiers, then delete after 30 days.

Budget-Triggered Automation

Budgets shouldn’t just alert — they should act. Connect cost thresholds to automated guardrails:

Auto-scale down or pause low-priority environments when thresholds are breached
Notify FinOps or engineering leads via preferred communication channels (Slack, Teams, email)
Annotate cost dashboards with spend thresholds or budget milestone markers (e.g., via Grafana, Datadog, custom UIs)

Enables proactive remediation instead of postmortem firefighting.

Insights from Customer Engagements

i. Fintech – QA Environment Sprawl (GCP)

Problem Statement: A fintech customer maintained five parallel QA environments in Google Cloud, each comprising GKE clusters, CloudSQL instances, and external load balancers. Despite being active only during working hours (8–10 hrs/day), these environments ran 24×7, incurring a monthly cost of ~$17,800. The environments were built manually, lacked standard tagging, and had no lifecycle automation.

Discovery & Analysis:

Enabled Billing Export to BigQuery to break down costs by project, label, and service type.
Found that 70% of compute resources remained idle overnight and on weekends.
No consistent labels were used to indicate shutdown eligibility or environment type.

Solution Implemented:

Applied standardized labels (env=qa, auto-shutdown=true, team=qa) across resources.
Introduced Cloud Scheduler to trigger shutdown of GKE node pools and VMs post 07:30 PM, storing original configurations in Firestore for stateful resume.
Automated morning scale-ups at 07:30 AM using Cloud Functions.

Outcome:

Monthly QA infrastructure cost dropped from $18,600 to $3,400 (60% savings)
No impact on test cycles. Test teams adjusted to the new timing.
New QA environments created via Terraform templates now include tagging by default.

ii. B2B SaaS Vendor – Demo Environment Lifecycle (AWS)

Problem Statement: An enterprise SaaS provider had over 40 active demo environments in AWS (EKS, ECS) for customer walkthroughs. Environments were created manually and rarely decommissioned, leading to sprawl. Many environments remained idle for weeks, with a monthly AWS bill exceeding $16,500 for just demo infrastructure.

Discovery & Analysis:

Used AWS Cost Explorer, CloudTrail, and CloudWatch to correlate runtime vs usage.
Found that ~50% of environments had no API calls or traffic for over 7 days.
Identified unassociated Elastic IPs, idle NAT gateways, and orphaned volumes as cost contributors.

Solution Implemented:

Introduced a CI/CD-driven demo stack creation using CloudFormation with env=demo tagging.
Deployed a Lambda-based cleanup workflow, triggered via EventBridge, to terminate stacks post TTL expiry.
DNS records were auto-managed via Route53 APIs, and TTLs stored in DynamoDB.

Outcome:

Monthly demo infrastructure cost dropped from $17,500 to ~$3,900(~78% savings)
The sales team now uses a self-service portal to spin up ephemeral demo stacks on demand.
Lifecycle policies are now centrally governed and automated.

iii. HealthTech AI Platform – Batch Pipeline Optimization (GCP)

Problem Statement: A HealthTech enterprise was running daily Spark-based ETL jobs on GKE using high memory nodes, often left running after job completion. These pipelines processed campaign data from BigQuery, Pub/Sub, and Cloud Storage. Despite a 3-hour active window, the infrastructure ran 24×7, costing ~$18,200/month in compute and storage.

Discovery & Analysis:

Identified usage patterns via Cloud Monitoring and GKE Metrics Server, showing a spike from 2 AM to 5 AM, then near-zero utilization.
Found stateless Spark jobs with retry mechanisms, making them suitable for interruption-tolerant infrastructure.

Solution Implemented:

Migrated heavy workloads to Preemptible VMs and used taints/tolerations for separation.
Introduced K8s CronJobs to trigger pipelines and scaled node pools dynamically using cluster autoscaler.
Logs and outputs stored in Cloud Storage, decoupling compute from storage.

Outcome:

Reduced monthly GCP spend from ~$18,200 to ~$4,100(~77% savings)
Zero job failures due to retry-tolerant design.
Adoption of this ephemeral infrastructure pattern across analytics and ML teams.

Final Takeaways & Guardrails

Avoiding runaway cloud bills isn’t about one-time fixes, it’s about consistent discipline, automation, and architectural intent. Here’s what separates low-cost, high-efficiency teams from the rest

Tag everything (env, owner, ttl) — without metadata, automation fails
Automate infra lifecycles — don’t rely on humans to shut things down
Use reserved IPs when DNS matters — avoid accidental breaks
Apply scale-out and scale-in policies — unused scale = silent cost
Set TTLs for non-prod and demo environments — orphaned infra is real
Restrict IAM scopes for automation — especially near prod
Log all lifecycle events — observability isn’t just for applications
Store less, store cheaper — apply lifecycle rules to logs and backups
Let budget alerts trigger action — not just notify
Keep cost visible — dashboards should reflect team, env, and trends
Separate ephemeral vs persistent workloads — avoid accidental impact
Favor serverless and spot instances wherever workload allows

Mindset Shift
Run nothing when idle.
Automate what you can.
Architect for waste elimination.
Let budgets shape behavior — not surprises.

Ready to Slash Your Cloud Costs?

Cloud savings don’t come from guesswork — they come from systems built for efficiency, automation, and accountability. If you’re ready to shift from reactive cost management to proactive cloud architecture, we’re here to help.

Let’s connect and start turning your cloud into an asset, not an expense.

CXO’s Agent: Enhancing Decision-Making Across Diverse Sources

invisiblcloud — Wed, 23 Jul 2025 05:10:53 +0000

Introduction

In today’s fast-paced business environment, executives and CXO’s are faced with an overwhelming volume of data scattered across various systems: dashboards, ticket-management platforms, customer relationship management tools, and document repositories. These leaders are tasked with making strategic decisions quickly, but the challenge lies in synthesizing insights from these disparate data sources without dedicating countless hours to manually searching, correlating, and analyzing them.

Traditionally, senior executives would need to sift through reports and dashboards, or rely on data teams to provide them with tailored insights. However, this manual process is time-consuming, error-prone, and often too slow to meet the demands of dynamic business environments. CXOs need data-driven decisions, but they lack the time to wade through mountains of raw data to extract actionable insights.

Enter the CXO Agent: a solution designed to allow executives to ask questions in plain language and receive precise, actionable answers, without having to navigate the complexities of underlying data sources. By querying across a wide variety of integrated systems—from databases and CRMs to ticketing tools and documents—the agent intelligently sources and correlates the right information, delivering responses that are both relevant and contextually rich. The only caveat is that the agent ensures secure access to data by respecting existing role-based access control (RBAC) protocols. This means that users only see the data they are authorized to view, maintaining data privacy and compliance at all times.

This solution takes full advantage of cloud-native technologies and AWS services, leveraging the scalability, security, and flexibility of the cloud to ensure that the agent can handle vast data sets and integrate seamlessly with existing enterprise infrastructure. By democratizing access to actionable insights, this agent empowers CXOs to make data-driven decisions with confidence and agility, all while ensuring that security and data integrity are upheld.

In this blog, we will walk through the architecture and functionality of this intelligent agent, discuss the AWS services that power it, explore the technical challenges faced during its development, and highlight a real-world case study that showcases the tangible benefits of the solution for senior decision-makers.

How the Agent Works

At its core, the agent serves as an intelligent interface between executive intent and enterprise data. It transforms natural language queries into data-driven, actionable insights by dynamically accessing and correlating information across a wide array of internal systems. This includes structured sources such as ticketing platforms and business dashboards, as well as unstructured data housed in document repositories.

The process begins when a user—typically an executive—poses a high-level question in natural language. This query is first interpreted by an LLM, which decomposes it into an execution plan. This plan identifies the types of data required to answer the question, as well as the specific sources to retrieve that data from.

The agent then performs a permissions check to determine which of the required data sources the user is authorized to access. This check leverages existing enterprise RBAC policies, ensuring that the agent never returns information from unauthorized systems. This RBAC-driven access control model is critical in preserving organizational data boundaries and complying with internal security protocols.

Once access is confirmed, the agent retrieves relevant information using two main methods.

Retrieval-Augmented Generation (RAG) is used to query document-based or semi-structured data sources. This allows the agent to pull contextually relevant excerpts from documents such as business reports, meeting minutes, and technical specifications.
Tool-based API calls are used for structured systems such as ticketing platforms, analytics dashboards, and customer support tools. These tools provide up-to-date, system-generated data that can be incorporated into the final answer.

The outputs are then synthesized into a unified response. Depending on how the question is phrased, the agent can return:

concise text summaries for quick insights;
charts or visualizations embedded in the interface;
strategic recommendations where applicable; or
downloadable reports in PDF format for sharing or record-keeping.

This architecture ensures that users can move seamlessly from question to insight without needing to understand where the data comes from or how it is queried, all while staying within the bounds of their access permissions. The result is a highly efficient decision-support tool that meets executives where they are, in both language and context.

The Technology

The architecture behind the agent is designed for scalability, modularity, and secure data access across heterogeneous enterprise systems. Built entirely on AWS, the solution leverages managed services to reduce operational overhead and ensure performance at scale, while retaining the flexibility to orchestrate complex, cross-system queries through a custom logic layer.

At a high level, the system comprises the following core components.

A frontend interface, where executives input natural language questions and view results — including summaries, visualizations, and downloadable reports.
A load balancer and API server that routes incoming requests, manages authentication, and enforces request limits and timeouts.
MCP servers (Model Context Protocol), which handle tool-specific logic, including integration with third-party APIs and custom connectors for enterprise systems such as dashboards and ticketing platforms.
Bedrock runtime, which is accessed via boto3 to run the large language model responsible for interpreting queries, generating data source access plans, and synthesizing final responses.
A database for storing metadata, audit logs, user preferences, and response history.
Blob storage for storing document extracts, generated PDFs, and intermediate results.

AWS Services

The deployment is centered around a tightly integrated AWS architecture.

Amazon EKS provides the orchestration layer for hosting the API server, MCP servers, and related microservices. EKS enables autoscaling, containerized deployments, and efficient allocation.
Amazon S3 is used as the primary object storage layer—both for ingesting documents for RAG queries and for serving generated assets (charts, reports, PDFs).
Amazon Bedrock serves as the LLM backbone. Through direct boto3 integration, the API server sends structured prompts to Bedrock for query decomposition, summarization, and content synthesis. Using Bedrock allows the system to scale AI workloads without managing infrastructure or finetuning models in-house.
Amazon OpenSearch Service supports fast document retrieval for RAG queries. Indexed documents are pre-processed and chunked with vector embeddings to enable semantic search during runtime.

LLM Integration

The LLM workflow is orchestrated entirely through native code. When a question is submitted, the agent passes the prompt to Bedrock, where the LLM performs the following.

Intent parsing and data source planning: Identifying which MCPs or document indices are required.
Access verification: checking if the user, authenticated via SSO, is authorized to query those sources.
Execution planning: Choosing between API-based data retrieval (via MCPs) or document-based RAG retrieval from OpenSearch and Bedrock Knowledge Bases.

Once the data is gathered, the LLM composes a human-readable response, optionally including charts or strategic recommendations based on the nature of the query.

Data Access and Security

All users authenticate via enterprise SSO, and their roles are managed by an internal RBAC system. Before any data access is granted, the agent performs a real-time role check to ensure that the user is authorized to access the requested systems and data types. This RBAC check is enforced both at the orchestration level and within individual MCPs.

Credentials for third-party systems are stored securely, and access tokens are managed via short-lived secrets or scoped API keys, depending on the integration. No data is ever accessed or shown unless the user has explicit permissions, maintaining strict adherence to data governance and privacy requirements.

Scalability and Fault Tolerance

The system is designed to be serverless and autoscaling. All workloads—including LLM calls, MCP-execution, and API-handling—are horizontally scalable. Failures (e.g., timeouts, tool unavailability, permission denials) are caught and reported clearly to the user, ensuring transparency without breaking the experience. Retry logic with upper limits is built in for transient errors, and results are cached where possible to reduce response times and avoid redundant calls.

Challenges Faced

1. Data Integration Across Heterogeneous Sources

Integrating data from a variety of sources—including documents, databases, APIs, and ticket-management systems—posed one of the first challenges. Each source comes with its own data format, access method, and structure, ranging from structured, tabular data to unstructured text documents. This diversity meant that we had to design a solution capable of handling these various types of data seamlessly. In addition, real-time data access and retrieval added another layer of complexity, as many systems required on-the-fly queries or interactions.

To tackle this, we introduced MCP servers as a centralized integration layer. Each MCP server was designed to interface with specific data sources, such as CRMs, ticketing platforms, or document repositories. These servers provided a common, standardized interface for data retrieval and processing, which simplified the overall integration process. By consolidating these connections through MCPs, we ensured that data from diverse sources could be accessed and processed in a consistent manner, making it easier to generate relevant insights on demand.

2. Natural Language Processing (NLP) and Intent Recognition

Understanding and interpreting the variety of queries posed by executives was another significant challenge. The queries were often high-level, multi-faceted, and phrased in different ways, making it difficult to parse and correctly interpret user intent. Executives may ask vague or complex questions that require a deep understanding of the context, often in the form of natural language with ambiguous phrasing. The challenge was not just interpreting the question but ensuring that the response was both relevant and actionable.

To overcome this, we implemented a feedback loop that allowed the system to clarify ambiguous queries. When the LLM was uncertain about the user’s intent, it would ask follow-up questions to narrow down the scope and ensure that the query was properly understood. By presenting multiple interpretations of the question, the system could offer the user an opportunity to confirm or refine their request. This approach helped mitigate the risk of providing inaccurate or incomplete answers, improving the accuracy of responses over time.

3. Role-Based Access Control (RBAC) and Data Security

Data security and access control were critical concerns, especially given the sensitive nature of the data involved. Executives and other users required access to different sets of data based on their roles within the organization. However, ensuring that only authorized users could query and view specific data sources without compromising security posed a significant challenge. We needed a dynamic system that could evaluate the user’s permissions in real-time, preventing unauthorized data access.

To address this, we implemented a robust RBAC system that dynamically assessed each user’s role and determined their data access rights before any query was executed. We relied on AWS Cognito for authentication and IAM roles to enforce access policies across various data sources. This approach ensured that only authorized users could access certain data sets, while unapproved requests were blocked automatically. Additionally, by using scoped, time-limited access tokens for each data source, we maintained security without introducing performance bottlenecks or unnecessary delays.

4. Query Understanding and Multi-Source Data Synthesis

The need to synthesize information from multiple data sources in response to a single query presented another significant challenge. Executives often ask questions that span several systems, such as combining ticketing data with CRM insights or integrating document-based information with real-time analytics. In these cases, the system needed to merge data from disparate sources into a single, coherent response. This was difficult because the data was often formatted differently, came from multiple platforms, and required complex aggregation to generate meaningful insights.

We solved this by implementing context-aware algorithms that intelligently identified which data sources were relevant based on the context of the user’s question. These algorithms allowed the agent to pull in data from multiple sources and combine it in a way that made sense. For example, if a user asked about the status of a project, the system could simultaneously pull ticket data, CRM information, and relevant documents to generate a comprehensive response. This approach ensured that even when dealing with complex, multi-source queries, the agent could provide a unified and actionable answer.

A Case Study: the CXO Agent for a Multinational Manufacturing Company

To demonstrate the power and versatility of our data insight agent, we implemented a specialized version of the solution for a large, multinational manufacturing company. This company faced significant challenges when it came to managing and interpreting data from several key sources that directly impacted operations and decision-making. These included observability data (from Service Now), cloud spend data (tracked through our own tool, Kiosk), and high-performance computing (HPC) data (monitored by our product, Tachyon).

Figure 1. The app with a text answer.

The company’s executives and senior management needed a seamless way to get answers to complex questions that spanned across these multiple data sources. Previously, manually correlating data from Service Now, Kiosk, and Tachyon was a cumbersome and time-consuming process. To empower the company’s leadership with faster, more actionable insights, we created a custom version of our agent that could synthesize information from these three distinct sources, offering both text-based summaries and visual reports.

Figure 2. The app with a graph answer.

Data Sources and Their Roles

Service Now (Observability Data)

Observability data provided by Service Now was critical for understanding IT operations, service disruptions, and infrastructure health. The company needed a way to quickly identify issues in real time and assess their impact on ongoing projects. With the agent, executives could query Service Now data for insights on system outages, incident timelines, and root cause analysis.

Kiosk (Cloud Spend Data)

Managing cloud spend was an ongoing concern for the company. With the growing scale of their cloud infrastructure, keeping track of costs in real-time was essential for budget management. Kiosk, our product for tracking cloud expenditures, was integrated into the agent to give leaders a clear, real-time overview of their cloud spend, broken down by department, service, or region. This allowed executives to quickly pinpoint areas where cloud costs were increasing and take corrective actions.

Tachyon (High-Performance Computing Data)

Tachyon, our HPC product, was used to monitor and track workloads in the company’s high-performance computing environment. The data generated by Tachyon was crucial for understanding the utilization of computing resources and ensuring that workloads were balanced efficiently across the system. Executives could use the agent to query information about the current load, bottlenecks, and resource utilization patterns, helping them make informed decisions about capacity planning and resource allocation.

Query Examples and Outputs

One of the key benefits of the agent was its ability to handle complex, multi-source queries. For example, an executive might ask, “What is the current impact of the ongoing IT outage on our cloud spend and HPC resource utilization?”

This query would require the agent to pull real-time observability data from Service Now, cloud cost data from Kiosk, and HPC performance data from Tachyon. The agent would then synthesize this information into a single, coherent response.

Alternatively, the agent could also be used for regular performance checks, such as, “What was the total cloud spend this week, and how does it compare to last week’s cloud usage?”

This query would involve pulling data from Kiosk for cost analysis and comparing it with historical trends. The output would consist of a text summary highlighting key trends, along with a bar graph that compares this week’s spending to previous periods.

Another example could be a question like, “What are the current bottlenecks in our HPC environment, and how are they affecting ongoing workloads?”

For this, the agent would pull performance data from Tachyon to assess resource usage and identify any inefficiencies. The output would include a detailed summary, possibly complemented by a line graph showing resource utilization over time.

Challenges Overcome in This Implementation

Despite the success of the agent, this specific implementation also presented unique challenges. Integrating data from such varied sources required ensuring that the agent could interpret and correlate data across different domains—service management, cloud economics, and high-performance computing. In addition, the diverse formats and access methods for these systems required us to ensure seamless interoperability.

To meet these challenges, we relied heavily on the MCP servers for handling different data source formats. The MCPs allowed us to build connectors for each system (Service Now, Kiosk, and Tachyon) while keeping the agent’s interface uniform. This ensured that queries could be routed to the appropriate data sources and processed efficiently, without the end-user needing to know the underlying complexity.

Additionally, the need to maintain data security and role-based access control (RBAC) for sensitive information was paramount. We implemented fine-grained access controls, ensuring that the system respected the company’s internal security policies, allowing executives to access only the data they were authorized to view.

Results and Impact

The deployment of the customized agent led to significant improvements in the company’s ability to quickly derive insights from multiple data sources. Executives were able to make more informed decisions by accessing real-time information on service disruptions, cloud costs, and HPC utilization, all within the same interface. Moreover, the system’s ability to generate both text summaries and graphs made the insights more digestible and actionable.

By reducing the time spent manually correlating data across systems, the company was able to improve operational efficiency and better manage both cloud spend and HPC resources. The executive team now has the ability to quickly assess critical business operations and make decisions based on accurate, real-time data.

Conclusion

In today’s data-driven world, executives need timely, actionable insights to drive informed decisions—but accessing, analyzing, and correlating data from multiple disparate sources remains a significant challenge. Our agent solves this by intelligently synthesizing data across a wide variety of systems, including observability tools, cloud spend trackers, and high-performance computing environments. By enabling CXOs to query multiple data sources with a single question and receive contextual, relevant responses, the agent removes the complexity traditionally involved in managing large-scale data environments.

The case study with the multinational manufacturing company serves as a clear example of how this solution can be adapted to meet the specific needs of diverse industries. Whether it’s monitoring IT service disruptions, managing cloud costs, or optimizing HPC workloads, the agent can provide insights that are both accurate and immediately actionable. Through the use of powerful technologies like MCP servers, LLMs, and role-based access control, the agent ensures that sensitive data is handled securely, while maintaining the flexibility and scalability required for large enterprises.

As organizations continue to generate vast amounts of data across a multitude of systems, the demand for intelligent agents capable of navigating and analyzing this information will only grow. By leveraging advanced data integration techniques and natural language processing, our solution empowers CXOs and decision-makers to stay ahead of the curve, make informed choices, and ultimately drive business success.

The future of enterprise decision-making is one where data can be effortlessly queried and turned into meaningful insights. With our agent, that future is already here.

AI Automobile Inspection: Visual Analysis to Structured Reports

invisiblcloud — Tue, 15 Jul 2025 11:20:43 +0000

Introduction

In industries where visual inspections are core to daily operations—such as insurance and manufacturing—accuracy, speed, and scalability often clash with human limitations. Claims assessors, quality assurance teams, and production-line supervisors routinely spend hours poring over images or physical components, making judgment calls based on experience and attention to detail. The process, though essential, is repetitive, time-consuming, and prone to variability.

To address this, we built an image-based assessment agent: an intelligent system capable of automating visual evaluations using deep learning, computer vision, and cloud-native services. Designed with flexibility in mind, the agent can be applied across a range of use-cases, from identifying vehicle damage in insurance claims to detecting production anomalies in electronic components.

What sets this agent apart is its accessibility to internal users. Rather than replacing human expertise, it augments it, helping claims settlers, QA teams, and other operations professionals process visual data faster, more consistently, and at scale. This is especially valuable in contexts where thousands of assessments must be performed each week and where visual cues determine critical downstream decisions.

Built on AWS technologies, this solution is a blend of automation, machine intelligence, and practical UX, shaped for real-world adoption. In this article, we’ll walk through how the agent works, explore the user journey, highlight the technology stack behind it, and share key lessons learned—including a case study from the automotive sector.

How the Agent Works

The image-based assessment agent is built to be modular, scalable, and simple to integrate into existing business processes. Its core functionality is driven by LLMs accessed through Amazon Bedrock, with orchestration and preprocessing handled entirely in Python.

This minimal yet powerful architecture allows the agent to perform complex visual assessments and generate structured reports with little infrastructure overhead, while taking full advantage of the elasticity and service abstraction that AWS provides.

1. Instruction Setup and Configuration

Each use-case begins with a simple configuration process where the agent is provided with

a prompt template describing the assessment task (e.g., what to look for and how to evaluate);
the output schema or report format; and
any domain-specific metrics it should consider during judgment.

2. Image Intake and Visual Analysis (via Amazon Bedrock)

When a new assessment is initiated, a batch of images is uploaded to the system and processed by a visual LLM hosted on Amazon Bedrock. This model is responsible for:

“viewing” the images;
extracting relevant visual insights; and
translating raw pixels into intermediate semantic understanding (e.g., “scratched bumper,” “loose connection,” “burn mark on PCB”).

The visual LLM is used in a model-agnostic way—meaning you can switch between supported Bedrock models like Claude, Titan, or Stability AI (depending on your need for multimodal capabilities).

3. Report-Composition with a Text-to-Text LLM

Once the visual analysis is complete, the findings are passed to a text-to-text LLM, also accessed via Amazon Bedrock. This second model is responsible for:

synthesizing the image-based insights;
formatting the response according to the pre-defined structure; and
generating a human-readable report.

This report could be as simple as a bullet list of observations or as detailed as a multi-section professional-grade assessment with valuation, severity ratings, and part-by-part breakdowns.

4. Orchestration and Infrastructure

All logic outside the LLMs—file handling, prompt construction, status tracking, and user interaction—is implemented in Python. AWS services such as Amazon S3 (for storing input/output assets) can be used to build a fully serverless backend if desired.

By leveraging Amazon Bedrock, the entire system remains model-flexible, cost-controllable, and scalable without ops-heavy management—a major win for fast-moving solution teams.

The User Journey

To ensure the capabilities of the image-based assessment agent are matched by a frictionless user experience, we designed the end-user workflow to be intuitive, fast, and optimized for field use. Whether the user is a claims settlor, a QA inspector, or a technician in a manufacturing facility, the interaction model remains consistent and easy to adopt.

1. Accessing the System

Users interact with the agent through a companion mobile or web application that serves as the front-end for all assessments. Upon login, they are greeted by a dashboard that displays a chronological list of prior assessments, including their statuses and outcomes. This provides continuity—allowing users to reference historical reports, resume incomplete assessments, or verify previously identified issues.

2. Initiating a New Assessment

From the dashboard, users can start a new assessment by entering relevant metadata about the subject under inspection. This could include product identifiers, usage data, location information, or any context-specific parameters that help the AI make more informed judgments.

3. Uploading Visual Evidence

Next, the user is prompted to upload a set of images, captured via mobile camera or uploaded from storage. Multiple angles and zone-specific shots are encouraged to give the visual model the best possible input.

4. Real-Time Status Updates

Once submitted, the assessment enters a processing queue and is automatically tagged with a dynamic status: Pending, Estimating, and finally Completed. This feedback loop helps users stay informed without needing to monitor the process manually.

5. Receiving the Assessment Report

When processing is complete, the final report becomes available for viewing. It includes:

the originally submitted metadata for traceability;
a structured breakdown of findings based on visual analysis;
quantitative or qualitative scores for each identified issue;
a summary designed for quick consumption and decision-making.

6. Post-Assessment Actions

The report can be reviewed, exported, or shared with other stakeholders depending on the workflow. In some cases, users may also have the option to contest or refine certain findings, especially if the system is integrated into a larger claims or quality control pipeline.

By translating a complex, judgment-heavy task into a simple, guided interaction, the user journey empowers professionals to make better decisions, faster, without sacrificing accuracy or oversight.

Case Study: Automating Vehicle Damage Assessments for Car Insurance

In the world of automotive insurance, resale, and fleet management, assessing the cost of vehicle damage is often a bottleneck. Traditional processes depend on human inspectors, rigid cost tables, and time-consuming interactions, sometimes involving travel to remote locations, centralized inspection depots, or live video calls. This manual approach introduces subjectivity, delays, and limited scalability.

To address these challenges, we worked with an insurance client to develop a specialized image-based assessment agent tailored specifically for vehicle damage evaluation. The goal was to review car images, identify and classify visible damage, and generate a structured, real-time cost estimate report, all without requiring a human inspector on site.

Mobile-First Deployment: A Workflow Designed for the Field

To make the experience practical for field inspectors and business users, the agent was deployed through a custom-built Android application. Designed with usability in mind, the app served as the primary interface for submitting images, tracking assessments, and retrieving AI-generated reports.

Figure 1. The dashboard of the Android app.

Upon launching the app, users were presented with a dashboard listing previously submitted assessments. From here, they could initiate a new report, entering key vehicle metadata such as mileage, age, and original purchase price. Next, users were prompted to upload images of the vehicle—ideally from multiple angles—to give the AI sufficient visual input.

Figure 2. The input form for the submission of a request for a new report.

Once submitted, the app tracked the request in real time, displaying dynamic status labels such as Pending, Estimating, and Succeeded. The final report, once ready, included:

a restatement of the vehicle metadata;
a detailed, part-by-part breakdown of detected damage;
severity classifications for each issue;
a revised valuation of the vehicle; and
a summary section formatted for quick review or export.

Figure 3. The generated report.

Under the Hood: A Two-Stage Intelligence Pipeline

The specialized vehicle agent followed the same two-step structure as the generic image assessment agent, but with additional domain-specific logic and report formatting.

Step 1: Visual Analysis via Amazon Bedrock

At the heart of the system was a visual LLM retrieved from Amazon Bedrock, which served as the perception layer. For each uploaded image, the model identified affected vehicle components and assigned a severity label using a three-point scale: Broken—Repair Required; Minor Repair Required; and Negligible.

The result was a structured, machine-readable list of damage events, each linked to a specific part and severity rating.

Step 2: Structured Report Generation

This intermediate output, along with the user-submitted metadata, was passed to a text-to-text LLM, also hosted on Amazon Bedrock. This model was guided by a strict template and tasked with composing a human-readable report. Instructions embedded in the prompt controlled:

report structure and tone;
the way damage was grouped and explained; and
valuation adjustments based on regional cost data and prior pricing benchmarks.

The outcome was a document that balanced technical accuracy with stakeholder readability, ready to be shared with customers, repair partners, or internal claims teams.

The Customization Challenge: Fine-Tuning for Precision

While the base models provided by Amazon Bedrock offered a strong starting point, the client requested additional specialization to improve accuracy and alignment with their internal standards.

To meet this need, we conducted a fine-tuning process for the visual model:

a labeled dataset of car images and expert assessments was curated and annotated;
a high-compute AWS EC2 instance was used to run the training process; and
usage of infrastructure was optimized using DeepSpeed and quantization techniques to make the training viable within resource constraints.

This fine-tuned version demonstrated improved part recognition and severity calibration, especially in edge cases like low-angle shots, poor lighting, or older vehicles.

Real-World Impact

The agent significantly reduced the time and effort involved in generating vehicle assessment reports. Inspections that once took hours or days could now be completed in under five minutes, end-to-end, from image capture to final report.

It also enabled remote inspections to be performed reliably, reducing the need for travel or synchronous video calls. Most importantly, the output became standardized, explainable, and traceable, helping the client align their valuation methodology across regions and agents.

Conclusion

The development of our image-based assessment agent illustrates how the strategic application of large language models—paired with cloud-native tools and thoughtful user experience design—can transform outdated, manual workflows into fast, scalable, and consistent processes.

Built on Amazon Bedrock and orchestrated entirely in Python, the agent delivers modularity and performance without operational overhead. Its architecture supports a wide range of use cases across industries like insurance, manufacturing, and logistics—anywhere visual inspection and structured reporting are core to decision-making.

Our work with a leading automotive insurer showed what’s possible when the agent is tailored to a specific domain. From optimizing the image analysis pipeline to fine-tuning models for better alignment with business expectations, we demonstrated that the path from general capability to industry specialization is not just achievable, but repeatable.

As foundation models and multimodal intelligence continue to evolve, so will the opportunities to unlock automation in visual-first processes. For organizations looking to modernize how they assess, document, and decide, this agent is both a practical starting point and a flexible framework, especially when supported by a team experienced in adapting AI to real-world complexity.

Scalable Intelligent Document Processing with Textract, OpenSearch, and Bedrock

invisiblcloud — Wed, 02 Jul 2025 12:53:58 +0000

Introduction

In today’s digital economy, enterprises are inundated with unstructured and semi-structured data, particularly in the form of documents such as invoices, contracts, reports, and forms. Processing these documents manually is not only time-consuming but also error-prone and expensive. As organizations strive to streamline operations and extract actionable insights from data, Intelligent Document Processing (IDP) has emerged as a transformative capability.

At the intersection of artificial intelligence, machine learning, and natural language processing, IDP enables businesses to automate the extraction, classification, and interpretation of document content with high accuracy. However, traditional IDP solutions often fall short when it comes to flexibility, scalability, and seamless integration into broader enterprise workflows.

To address these limitations, we developed a modular, AI-powered agent purpose-built for intelligent document processing. This agent is designed to operate autonomously across various document types and use-cases, while providing configuration options for business-specific rules and integrations.

This article provides a comprehensive overview of the agent, including how it works, the user journey it supports, the underlying technologies, and the development challenges we faced. A key part of the article is dedicated to a real-world case study in the banking industry, where the agent was used to automate document workflows for onboarding retail customers, a process traditionally burdened by high document volume and regulatory compliance requirements.

How the Agent Works

The Intelligent Document Processing (IDP) agent is designed to automate the transformation of unstructured document content—often in the form of scanned PDFs or handwritten forms—into structured, machine-readable data that can be consumed by downstream systems and workflows. By leveraging advanced AI technologies, the agent can process a wide variety of document types with minimal human intervention.

At a high level, the agent follows a multi-step pipeline to convert incoming documents into structured data. While the core pipeline remains consistent across use cases, there are two primary approaches to data extraction that the agent supports, depending on the nature of the input document and the required accuracy, cost, and runtime characteristics.

Figure 1. A flowchart of the two methods of extracting structured data from a PDF.

As illustrated in figure 1, the extraction pipeline begins with a common set of preprocessing steps, followed by one of two extraction paths.

Pre-processing Pipeline

Regardless of the extraction method used, the initial document handling process includes the following stages.

Document Ingestion: The agent receives an uploaded document (in PDF format).
Page-Splitting: The document is split into individual pages for independent processing.
Image Conversion: Each page is converted into a high-resolution image.
Image Post-processing: The image is cleaned and standardized—e.g., converted to grayscale, resized, and deskewed—to optimize downstream recognition performance.

Extraction Method 1: Vision-Language Model (VLM)

This method uses a Vision-Language Model (VLM), a type of large language model capable of interpreting both visual and textual inputs. VLMs are particularly effective for structured or printed documents where layout and formatting provide strong semantic cues.

Input: Each pre-processed image (page) is passed to the VLM.
Instruction Prompting: The model receives structured instructions describing the expected schema and layout of the document.
Output: For each page, the VLM returns a structured data block (e.g., JSON), which is later aggregated across all pages.

While this method is well-suited for printed and well-scanned forms, its performance may degrade when processing handwritten text or documents with low visual quality. Additionally, VLMs are computationally expensive, which can affect scalability and cost-efficiency.

Extraction Method 2: OCR + Text-to-Text LLM

In scenarios where documents contain significant amounts of handwritten or degraded text—or when resource efficiency is a concern—the agent switches to a hybrid extraction approach that separates vision and language understanding tasks.

Optical Character Recognition (OCR): Each page image is processed using a high-accuracy OCR engine. The output is structured as markdown or rich text that preserves the layout and semantic formatting of the original document.
Text-to-Text Language Model: The extracted markdown text is passed to a text-only large language model (LLM), along with prompts describing the expected data schema and section definitions.

This two-step method provides greater flexibility for handling noisy inputs, while often reducing overall compute costs. The final structured output is aggregated from each page and validated against business rules.

By supporting both processing methods, the agent offers a robust and adaptive architecture capable of handling diverse document types—from clean printed forms to handwritten records—while balancing accuracy, cost, and performance based on the specific operational context.

Ensuring Accuracy: a Benchmarking Framework for Data Extraction

This framework is a tool designed to compare two sets of data, one representing the “expected” values and the other representing the “actual” results, which are typically stored in JSON format. The primary goal is to assess how closely the actual data matches the expected data. It does this by thoroughly analyzing the structure and content of both datasets, even when they are complex, hierarchical, or contain lists. Discrepancies between the two datasets—such as missing or extra fields, as well as slight formatting differences—are identified and documented. The comparison process also accounts for minor variations like spaces in strings or different capitalizations, ensuring a more consistent and fair evaluation of the data.

Once the comparison is made, the framework logs the results in a dedicated file for future reference. Each comparison is recorded with details such as the accuracy percentage and a timestamp, allowing users to track the performance of different datasets over time. In addition to logging the overall accuracy, the framework breaks down the results into sections, providing a detailed view of how well each part of the dataset matches. This section-based breakdown is useful for pinpointing specific areas where discrepancies are most prominent, giving users insights into where their data may need adjustment or improvement.

To further enhance the analysis, the tool also generates a visual representation of the accuracy for each section. This is presented as a bar chart, where each bar shows the percentage accuracy of a specific section of the dataset. The chart provides an easy-to-understand overview of which sections performed well and which ones need attention, making it an effective way to visualize the results. Overall, this framework serves as a comprehensive solution for evaluating and comparing datasets, particularly when working with structured data like JSON, offering both numerical accuracy metrics and visual feedback to guide decision-making.

The User Journey

To ensure a seamless and intuitive experience for business users, the Intelligent Document Processing (IDP) agent is accompanied by a web-based interface that facilitates document submission, monitoring, and review. This web application is designed to integrate smoothly into existing enterprise platforms or be deployed as a standalone interface, depending on the use-case.

The user journey is designed to be simple and transparent, guiding users through each stage of the document processing lifecycle.

Upload: Users begin by uploading a scanned or digital document—in PDF format—through a secure web interface. The document may originate from various sources, such as physical scans, email attachments, or uploads from third-party systems.
Processing: Once submitted, the document enters the agent’s backend processing pipeline. The user is provided with real-time feedback on the processing status through visual indicators such as progress bars or status messages. This step is fully automated, requiring no user input while the AI performs image transformation, content recognition, and data extraction.
Review and Confirmation: Upon completion, the user is presented with a structured web form populated with the extracted data. The original document is displayed side-by-side with the form, enabling users to quickly compare and verify the accuracy of the extracted information. Users can make manual corrections if needed before submitting the final structured data to downstream systems.

This intuitive, guided workflow ensures that non-technical users can interact effectively with the AI agent, reducing friction in adoption and increasing confidence in automated document processing outcomes.

The Technology

The intelligent document processing agent is built on top of a focused, cloud-native architecture leveraging key AWS services for scalability, observability, and seamless integration with enterprise systems.

When a document is uploaded through the web application, it is stored in Amazon S3, which serves as the central repository for all incoming files. S3 offers a reliable and secure way to handle high volumes of scanned documents, ensuring durability while enabling fast access for downstream processing.

To extract structured data from these documents, the agent supports two primary processing flows. In one approach, the agent uses Amazon Textract, AWS’s managed OCR service, to convert scanned or handwritten documents into text while preserving layout information. Textract is especially useful when dealing with lower-quality scans or documents with handwriting, where vision-language models may struggle with accuracy or efficiency.

To enhance handwritten text recognition and overcome the limitations of traditional OCR and Vision Language Models (VLMs), the agent integrates LLMWhisperer, a specialized AI tool trained to improve accuracy on handwritten documents by combining advanced language modeling with custom fine-tuning. This complementary approach significantly boosts extraction accuracy for handwritten content.

For more complex extraction tasks, the agent uses foundation models available through Amazon Bedrock. Depending on the nature of the document and the configuration of the pipeline, either a Vision Language Model (VLM) or a text-to-text Language Model (LLM) is invoked. Vision models are used to interpret entire document pages as images, while text models operate on OCR output for a more cost-efficient and targeted analysis. Bedrock provides a unified interface to access these models, simplifying integration and ensuring that the agent remains adaptable to evolving model capabilities.

As documents are processed, the system logs each job and its associated metadata to Amazon OpenSearch Service, which acts as the central indexing and monitoring solution. OpenSearch enables real-time search and filtering across processed jobs, making it easy for support teams or business users to query the status of a submission, review outputs, or audit the processing history.

Together, these technologies form a robust foundation for the agent, balancing performance, cost, and flexibility. By leaning on managed AWS services for storage, AI, and observability, the solution is both maintainable and scalable—ready to meet the document processing demands of a modern enterprise environment.

Challenges Faced

Building a robust intelligent document processing agent capable of handling diverse and complex inputs presented several technical challenges, particularly around the accurate extraction of handwritten content and the effective utilization of large language models.

One of the most significant hurdles was processing handwritten documents. Handwriting variability, noise from scanned images, and inconsistent formatting posed difficulties for both VLMs and OCR technologies. While Amazon Textract provided reliable OCR capabilities for printed and some handwritten texts, it struggled with more challenging handwritten inputs. To overcome this, the integration of LLMWhisperer—a specialized AI tool optimized for handwritten text recognition—proved critical in enhancing accuracy and overall extraction quality.

In parallel, the team encountered extensive trial and error in selecting and tuning the appropriate foundation models within Amazon Bedrock. This iterative process involved testing multiple models and configurations to balance performance and cost-effectiveness. Ultimately, the team settled on Llama 3.2 models tailored for different pipeline segments: the 90-billion-parameter version for vision-centric tasks involving direct image inputs, and the 3-billion-parameter version for the OCR-then-LLM flow to process textual data extracted from documents.

Given the complexity and variability of the extraction tasks, systematic evaluation was essential. This need led to the creation and deployment of a comprehensive benchmarking framework capable of quantitatively comparing extracted data against ground truth with a high degree of granularity. The framework enabled continuous assessment of model performance during the experimentation phase, guiding informed decisions about model selection, prompt engineering, and pipeline adjustments. It also provided transparent reporting and visualization of accuracy metrics, which was invaluable for tracking improvements and identifying persistent extraction issues.

Together, these challenges shaped the development approach, emphasizing rigorous testing, specialized tooling for handwritten text, and adaptive use of foundation models to deliver a dependable intelligent document processing solution.

A Case Study: Retail Banking Onboarding

One of the earliest and most impactful applications of our intelligent document processing (IDP) agent was in the retail banking sector, specifically in automating the onboarding process for new customers. This traditionally paper-heavy workflow required applicants to fill out physical forms by hand, which were then scanned and processed manually by operations staff. This process introduced several pain points.

Manual Data Entry: Each form had to be read and transcribed by staff, a time-consuming and error-prone task, particularly with high volumes.
Slow Processing: Delays between form submission and account activation led to poor customer experience and operational backlogs.
Inconsistent Customer Experience: Accuracy depended heavily on individual operator attention, and errors in transcription could lead to failed compliance checks or delays in onboarding.

To address these challenges, we developed a specialized version of the IDP agent tailored to the bank’s onboarding forms. This version of the agent was embedded within a secure internal web application designed for use by frontline staff and operations teams.

Initiating a New Request

The user interface allows staff to easily upload scanned onboarding forms into the system. Upon submission, the agent begins processing the document automatically.

Figure 2. The screen that prompts the user to upload a document.

Real-Time Document Processing

Once uploaded, the user can monitor the progress of the request through a live status interface. The backend pipeline applies preprocessing, OCR, and AI-driven data extraction in a matter of seconds.

Figure 3. The extraction of data in progress.

Accurate, Validated Results

Upon completion, the extracted data is displayed in a structured web form side-by-side with the original document. This allows staff to quickly verify the results. A key enhancement in this solution was the use of prompt engineering to enforce field-level validation rules—such as confirming that national ID numbers follow a certain pattern, that date of birth is in a valid range, or that mandatory fields are not left blank.

These rules were built into the LLM prompt context, ensuring that the agent didn’t simply extract what it saw, but also checked it against banking requirements. This dramatically reduced the number of form rejections and manual corrections.

Figure 4. The side-by-side of the document and the extracted data.

Results That Rival Human Accuracy

Through multiple iterations of prompting, benchmarking, and fine-tuning—including the selection of the Llama 3.2 90B model for vision tasks and the 3B model for OCR-to-text workflows—we were able to achieve greater than 95% extraction accuracy across real-world onboarding form submissions. This matched or exceeded the performance of manual data entry staff, with the added benefits of speed, consistency, and auditability.

Conclusion

Intelligent document processing has become a critical capability for organizations seeking to automate manual workflows, improve operational efficiency, and ensure data quality at scale. The agent described in this article illustrates how a thoughtful integration of AI technologies and cloud infrastructure can deliver reliable, accurate, and adaptable document automation.

Through the use of multiple extraction strategies—including vision-language models, OCR pipelines, and tailored prompt engineering—the agent achieves high performance across a range of document types and formats. In the case of retail banking onboarding, this approach delivered greater-than-human accuracy, improved processing times, and reduced the operational burden of manual data entry.

The supporting benchmarking framework, cloud-native architecture, and web interface all contribute to a production-ready solution that is both scalable and auditable. As document complexity and volume continue to grow, this agent architecture can be adapted across industries such as banking, insurance, healthcare, and government services—where high-volume, high-accuracy document processing is essential to core operations.

Automated Reporting Agent for High-Trust Document Workflows Using AWS

invisiblcloud — Tue, 01 Jul 2025 12:09:01 +0000

Introduction

In industries like finance, pharmaceuticals, and market research, the ability to generate accurate, structured reports from large corpora of documents is a time-consuming and often error-prone task. Extracting actionable insights from vast amounts of unstructured data can overwhelm manual processes, especially when the corpus includes complex document formats like tables, charts, or varied layouts.

To address these challenges, we developed a reporting agent designed to automate and streamline the process of transforming raw text and data into structured, comprehensive reports. This agent not only processes large corpora of documents but also follows a predefined report structure, ensuring consistency and precision in the final output. The reporting agent’s ability to handle complex formats—including but not limited to nested tables, tables with multi-level column headers, graphs, and any of the above spread across two or more pages—and bypass the limitations typically imposed by LLM context window sizes makes it an invaluable tool for businesses seeking to optimize their reporting workflows.

In this article, we will explore the workings of the reporting agent, dive into the underlying technologies that power it, and examine a case study where we specialized this solution for pharmacovigilance in drug discovery.

How the Agent Works

The Agent Architecture

At the core of the reporting agent lies a flexible template-driven architecture that allows users to define the structure of the reports they want to generate. These report templates are embedded with placeholders: tagged components that indicate where specific data or insights should be populated. Each placeholder is associated with a parameter, which acts as a logical abstraction that can be mapped to specific types of content such as text summaries, table extractions, chart interpretations, or metadata like page numbers. Crucially, these parameters can be reused across multiple templates, enabling a scalable and modular approach to report generation.

To initiate a reporting job, the user provides two inputs:

a corpus of PDF documents; and
a pre-defined template, built in advance with embedded placeholders tied to the reporting objectives.

Once initiated, the agent performs multi-modal document extraction, capturing not only raw text but also structured elements like tables, embedded images, and layout information such as page references. This extraction is designed to be both comprehensive and context-aware, ensuring no meaningful content is overlooked.

The extracted information is then mapped to parameters, filling the placeholders in the report template with domain-relevant insights. This mapping is done intelligently, maintaining the semantic alignment between what was requested and what was found in the source material.

To manage performance and accuracy at scale, the agent uses a dual-processing flow depending on document size.

Small documents (fewer than 100 pages) are processed as a whole, allowing the agent to analyze them in a single pass.
Large documents (100 pages or more) are automatically split into 30-page chunks, each of which is processed independently. This chunking approach allows the system to navigate the limitations of LLM context windows while maintaining continuity across sections, and the context-awareness of the extraction process ensures no semantic meaning is lost.

This dual strategy ensures that both light and heavy documents are processed efficiently without compromising the fidelity of the report output.

The User Journey

To make this process accessible and intuitive, the agent is delivered through a web-based interface with three top-level dashboards.

The Reports Dashboard provides an overview of all report-generation jobs submitted to the agent. Users can view job statuses, monitor progress, and access completed reports directly from this interface. It also serves as the starting point for creating new jobs, allowing users to select a report template and upload a corpus of documents with just a few clicks.
The Templates Dashboard contains a library of all currently available report templates. In many regulated industries, public-facing documents follow strict formatting guidelines or boilerplate structures. These templates can be created and registered here, embedded with reusable placeholders to reflect industry-standard reporting needs.
The Parameters Dashboard acts as the reference layer for the system, listing all defined parameters along with human-readable descriptions. This aids both users and the agent in understanding the semantics behind each placeholder—ensuring accurate data extraction and contextual alignment in the final report.

Together, these components create a seamless end-to-end experience for users—from defining reporting logic to generating high-quality outputs—all while abstracting away the complexity of underlying document processing.

The Technology

Behind the intuitive interface and seamless reporting flow is a robust and scalable cloud-native architecture built primarily on AWS. Each component of the reporting agent is designed to ensure efficiency, reliability, and flexibility, leveraging AWS infrastructure to handle scale and cost optimization.

The Database

At the core of the system is a MongoDB instance hosted on AWS, which serves as the primary database. This database supports all functional layers of the application:

job records for the Reports Dashboard;
template definitions for the Templates Dashboard;
parameter metadata for the Parameters Dashboard; and
document data, which includes structured content extracted from the source files.

When a document is ingested, the system computes a checksum, which is stored in the database alongside the extracted information. This enables the agent to perform a lightweight deduplication check: if a document with the same checksum is received again in a future job, the agent can reuse previously extracted content, saving both time and compute cost.

Data-Extraction Components

The data extraction layer is built using Python, leveraging libraries for parsing PDFs. For scanned or image-based documents, the agent uses Tesseract OCR, deployed via AWS services, to convert images into machine-readable text, tables, and visual metadata.

Report-Generation Components

When it comes to generating the final report content, the system connects to Amazon Bedrock to orchestrate interactions with large language models. By default, the agent utilizes Anthropic Claude 3.5 Sonnet v2 for its balanced performance across comprehension, summarization, and structured reasoning tasks. However, this can be configured to use other models available through Bedrock based on client or domain requirements.

Blob Storage for Generated Reports

Once a report is fully generated, it is saved to Amazon S3, making it accessible both within the UI and through pre-signed URLs for secure downloads. This ensures that report access remains efficient, secure, and easily integrated into downstream workflows.

This modular, AWS-backed architecture allows the agent to scale seamlessly while maintaining performance across high-volume, computation-heavy reporting tasks—particularly in data-intensive industries like finance and pharmaceuticals.

Challenges Faced

Building a reporting agent capable of handling real-world document corpora required solving a number of non-trivial technical challenges. These ranged from dealing with the complexities of document formats to ensuring performance and semantic accuracy at scale. Below are some of the key obstacles we encountered and how we addressed them.

Complex Tables

One of the most persistent challenges was the extraction of information from complex tables. Many documents, particularly in finance and pharmaceutical reporting, include multi-layered tables with merged cells, nested structures, or variable header rows. These formats are often beyond traditional parsers. To address this, we implemented custom Python-based parsing logic capable of interpreting and reconstructing these tables into structured data representations suitable for template mapping.

Embedded Images

Another major hurdle was handling images embedded in the documents—such as charts, annotated figures, or image-based infographics—that often contained critical information not present in the surrounding text. To extract data from these formats, we integrated AWS Tesseract OCR, which enabled the system to interpret and extract text and structure from image-based content with high fidelity.

Documents as Scanned Images

A related challenge was dealing with scanned documents, which are common in domains where archival or regulatory documents are often stored as image-only PDFs. These documents lacked embedded text and required robust OCR to extract meaningful information. The Tesseract-powered OCR pipeline helped bridge this gap, extracting not only text but also layout-aware elements like headings and tables for consistent downstream use.

Document Sizes

We also encountered significant variability in document size, with some reports stretching beyond 100 pages. Processing such large documents directly with an LLM was not feasible due to token and context window constraints. To overcome this, we developed a two-track approach: smaller documents were processed in a single pass, while larger ones were automatically divided into 30-page chunks. This kept each processing unit within context limits while enabling distributed handling of large documents.

Dealing with Semantic Loss from Splitting Large Documents

Chunking large documents introduced the risk of losing semantic continuity between sections. Important context or cross-references could be missed if each chunk was treated in isolation. To mitigate this, we implemented a dual strategy: overlapping context windows between chunks helped preserve local continuity, and a final summarization pass was performed across all extracted data to unify insights and maintain narrative cohesion.

Performance and Cost

Finally, we addressed the critical concern of performance and cost efficiency. With large-scale document processing, particularly involving OCR and LLM inference, costs can escalate quickly. We implemented a document checksum mechanism using our AWS-hosted MongoDB backend. When a document is uploaded, the agent computes a checksum and checks it against stored entries. If a match is found, the previously extracted data is reused, significantly reducing processing time and compute resource usage.

Together, these solutions allowed the reporting agent to handle a broad variety of document formats, maintain performance across different workloads, and generate high-quality outputs even in complex, high-volume environments.

A Domain-Specialization Case Study: Pharmacovigilance in Drug Discovery

One of the earliest and most impactful applications of the reporting agent was in the domain of pharmacovigilance—the process of detecting, assessing, understanding, and preventing adverse effects or any other drug-related problems during clinical development. Our goal was to enable faster, more accessible reporting for stakeholders ranging from regulatory bodies to non-technical executive teams.

The Use-Case

Pharmaceutical companies often conduct detailed clinical study reports (CSRs), which summarize trial design, methodology, results, and safety observations. These documents are typically 20–30 pages long but densely packed with complex tables, graphical summaries, and domain-specific language. They often include:

multi-column tables with varied formatting;
charts embedded as images with referenced text; and
dense narrative sections linking adverse events to patient cohorts or dosage groups.

Manually distilling these reports into accessible formats for different stakeholder groups was labor-intensive and error-prone. Our task was to specialize the reporting agent to automatically generate audience-specific summaries using customizable templates.

Template Design for Multi-audience Reporting

To accommodate different reporting requirements, we developed three distinct template categories:

Public-Facing Reports: Simplified summaries for non-technical audiences, focusing on safety, efficacy, and trial scope.
Regulatory Reports: Structured templates tailored to the data disclosure requirements of agencies like the FDA or EMA.
Shareholder Briefings: Executive-level summaries focused on trial outcomes, risks, and potential business implications.

Each template was embedded with placeholders and reusable parameters such as the following.

study_intervention
study_end_date
medicine_studied
country_distribution
lay_study_title

These parameters not only guided the agent in generating content but also preserved traceability between extracted insights and source material.

Figure 1. The Templates Dashboard.

Customization and Extraction

The core document used was a PDF-based CSR. Despite its relatively small size, its formatting complexity—such as tables with misaligned columns or text-delineated columns—demanded robust parsing. We adapted our extraction pipeline with:

python-based custom table parsers; and
minimal prompt-tuning to help the LLM interpret domain-specific language (e.g., “serious adverse event” vs. “non-serious”).

Figure 2. The Parameters Dashboard.

LLM-Driven Generation and Result Traceability

Once extracted, data was mapped to the appropriate template using our parameter system, and the report was generated using Claude 3.5 Sonnet v2 via Amazon Bedrock.

An important enhancement in this domain was the ability to trace every generated insight back to its source. Parameters carried metadata such as:

page numbers;
original table or paragraph references; and
document checksums.

This allowed stakeholders to verify claims by drilling down from the generated report back to the exact page or table in the original CSR.

Figure 3. The Reports Dashboard.

Measurable Impact

transparency and integrity were established thanks to a clear data lineage;

report generation time was reduced from hours to minutes;

manual validation effort dropped significantly due to traceability features; and

non-technical teams reported improved comprehension and trust in the outputs.

Figure 4. A view of a generated report.

Conclusion

As industries grapple with ever-growing volumes of complex documentation, the need for intelligent, scalable reporting solutions has never been more critical. The reporting agent we’ve developed addresses this gap head-on, transforming unstructured documents into structured, stakeholder-ready reports with remarkable efficiency.

By combining a robust template-driven design, multi-modal document extraction, and scalable language model integration through AWS infrastructure, the agent enables users to generate high-quality reports with traceable, verifiable insights. Its adaptability across domains—from finance and market research to highly specialized areas like pharmacovigilance—proves its utility in real-world, high-stakes environments.

The pharmacovigilance case study demonstrated not only a tangible reduction in effort and turnaround time but also the importance of transparency and traceability in AI-assisted reporting. By providing end users with both insight and lineage, the system builds trust while delivering measurable value.

Looking ahead, the potential for further domain specialization, integration with active learning systems, or even real-time document stream processing opens the door for the agent to become a core component in enterprise knowledge workflows.

As AI capabilities mature, tools like this reporting agent will be at the forefront of making complex information accessible, actionable, and aligned with business objectives.

Tachyon the Self-service HPC built for the cloud

invisiblcloud — Fri, 28 Feb 2025 12:35:52 +0000

R&D teams rely heavily on High Performance Computing (HPC) infrastructure to run various product design simulations (such as Computational Fluid Dynamics, Material Science).

Accelerating Innovation requires frictionless access to the best simulation tools and computing infrastructure.

On the other hand, IT requires the ability to centrally govern and manage HPC infrastructure to deliver a seamless experience to R&D teams and yet remain in control. However, current HPC experience at the enterprises come with a number of challenges for both R&D teams and IT.

Challenges faced by R&D Teams

Lack of user friendly HPC Access

R&D teams often lack HPC expertise and need a simple, user-friendly interface to submit and manage their jobs. They are often exposed to the underlying HPC infrastructure, go through steep learning curves and need IT support to track and manage their jobs.

Slow Experimentation and Iteration Cycles

R&D teams rely on traditional, on-premise HPC clusters that have capacity constraints and limited scalability leading to long simulation run times and slower experimentation cycles

Data Management & Collaboration

R&D teams generate vast amounts of simulation data leading to complexities in data management. Delivering successful R&D projects requires seamless collaboration between different stakeholders spread across globally distributed teams.

Challenges faced by IT Teams

Infrastructure Scalability & Capacity Constraints

On-premises HPC systems have capacity constraints limiting scalability during peak usage periods. IT is often unable to meet the growing demands of R&D teams to run faster simulations and iteration cycles to accelerate product innovation.

HPC Governance

IT teams lack visibility of how HPC infrastructure is utilized across different R&D teams. They often struggle to understand how HPC resources are utilized and are unable to ensure fair usage of available resources across R&D teams.

HPC Cost Management

IT administrators need a system to allocate budgets and track costs to ensure that R&D teams receive a fair share of the infrastructure while staying within budget. This is crucial as R&D organizations have specific budgets for their R&D projects.

Software & License Management

IT teams struggle with managing, updating and distributing different software tools required by engineering teams. They also struggle with visibility and integrations of licenses into HPC workflows.

What is Tachyon?

Tachyon is a self-service HPC platform designed by Invisibl Cloud to streamline the management and utilization of HPC resources on the cloud. It provides a unified interface for researchers, engineers, and IT teams to run complex simulations, manage workloads, and monitor performance—all without the typical complexities associated with HPC environments.

Using Tachyon, researchers can

Submit jobs via a simple user interface.
Monitor the jobs and perform troubleshooting using the unified observability and logs features.
Manage their input/output data used for the simulations through a simple file manager interface.
Request and launch remote workstations in a self-service manner without any IT intervention.

IT teams can

Centrally manage HPC infrastructure across multiple Cloud accounts and regions.
Create budgets across project teams down to the user level and control spending.
Access fine-grained spending information using a simplified dashboard interface.
Ensure compute resources are allocated fairly to researchers using governance policies.
Monitor the infrastructure through a unified observability interface.

Key Features

Job Submission and Monitoring

With Tachyon, users can submit jobs to the HPC clusters and monitor their progress through a single pane of glass. The platform provides real-time visibility into job execution, enabling researchers to track the status of their simulations and troubleshoot issues quickly using integrated logs and metrics.

Workstation Machine Catalog

Tachyon features a custom machine catalog that allows users to launch remote workstations with varying compute capacities. The IT team can create new machine catalogs in various OS flavours with pre-installed application software that are applicable for different R&D use cases performed by the researchers. The IT admin can also control the type/capacity of the compute resource used for the remote workstations. This flexibility ensures that researchers can select the appropriate resources for their specific tasks, optimizing both performance and cost.

Budget and Spend Board

The platform allows IT admins to create budgets across departments, projects and to the user level. The budget setup helps in controlling the spending by checking the availability of budget when the researchers run their simulations. The jobs can be submitted to the HPC system only if the project or the user has enough budget available to them.

Tachyon provides detailed insights into HPC spending, allowing organizations to track costs in real-time. The budget and spend board feature helps teams monitor expenditures at various levels—project, cluster, partition, and user—enabling continuous cost optimization and preventing overspending.

Unified Observability

The platform offers unified observability, integrating monitoring tools into a single interface. This feature simplifies troubleshooting by providing comprehensive insights into cluster health, resource utilization, and workload performance, ensuring that issues are identified and resolved quickly.

File Manager

Tachyon’s file manager enables users to manage input and output files directly within the platform. It also supports file sharing among colleagues, facilitating collaboration and ensuring that all team members have access to the necessary data for their projects.

Quality of Service (QOS)

The platform’s QOS policies help maintain consistent performance across HPC environments. Tachyon’s policy-driven governance ensures that resources are allocated according to organizational priorities, preventing non-compliant resource usage and ensuring that critical tasks receive the necessary computational power.

Policy-Driven Governance

Tachyon’s governance framework is policy-driven, allowing organizations to enforce compliance across all HPC resources. Policies can be set at the cluster, partition, and user levels, ensuring that all activities adhere to organizational standards and regulatory requirements.

Reports

Tachyon offers comprehensive reporting capabilities, allowing users to generate detailed reports on job execution, resource utilization, and budget compliance. These reports can be viewed and downloaded on demand, providing valuable insights for decision-making and continuous improvement.

Application and Infrastructure Alerts

The platform provides customizable alerts for both application and infrastructure events. Users can set thresholds for resource usage, job failures, and other critical metrics, ensuring that they are immediately notified of any issues that may impact performance or compliance.

Security – Access Control and Data

Tachyon enhances security through role-based access control (RBAC) and data encryption. The platform ensures that only authorized users can access sensitive data and HPC resources, providing a secure environment for research and development activities.

Create and Manage Clusters

Tachyon simplifies the process of creating and managing HPC clusters by providing a user-friendly interface that abstracts the complexities of cloud infrastructure. This feature allows platform teams to standardize HPC infrastructure across all teams, ensuring consistency and compliance while reducing the need for specialized cloud expertise.

Queue Management

Tachyon offers dynamic queue management, enabling users to efficiently handle various computational workloads. The platform allows for the setup of queues that cater to different use cases, ensuring optimal resource allocation and reducing wait times for simulations and computational tasks.

License-Aware Scheduling and Monitoring

Tachyon incorporates license-aware scheduling, ensuring that software licenses are used efficiently. The platform monitors license usage, preventing overconsumption and ensuring that resources are available when needed without exceeding licensing agreements.

Key Benefits

Accelerated Research

With Tachyon, researchers’ productivity can increase up to 5 times as they do not have to wait for compute resources and also can provision the required resources in a self-service manner with no IT intervention. The self-service nature and the ability to decide when the users would want to run their simulation jobs, enhances their productivity and hence speeds up the research process.

Cost Efficiency

By leveraging cloud infrastructure, Tachyon can reduce research costs by 60%, offering significant savings compared to traditional on-premises HPC setups.

Increased Productivity

The platform enables a tenfold increase in research productivity by providing immediate access to HPC resources and reducing the overhead associated with managing these environments.

Seamless Integration with Existing Workflows

One of Tachyon’s standout features is its ability to integrate seamlessly with existing HPC workflows. Whether you are running simulations using ANSYS or PowerFLOW, Tachyon handles the infrastructure, allowing you to focus on your research. The platform supports multiple workload managers, including SLURM, and offers dynamic queue management to cater to various computational needs.

Centralized Management

Tachyon provides a single control plane for managing HPC clusters, storage, and databases, all while ensuring compliance with organizational policies. This centralized approach not only standardizes cluster provisioning across teams but also enhances security through continuous governance and policy management.

Improves Overall Efficiency

Integrated monitoring features offer a unified view of all HPC activities, simplifying troubleshooting and improving overall efficiency. With real-time insights into job execution, compute utilization, and cost tracking, Tachyon empowers organizations to optimize their HPC investments continuously.

Simplifying HPC on the Cloud

By abstracting the complexities of cloud infrastructure, Tachyon allows IT teams to migrate on-premise HPC workloads to the cloud effortlessly. The platform’s simplified interface and automation features reduce the need for a large team of cloud experts, making it easier to manage and scale HPC environments.

Conclusion

Tachyon is more than just an HPC platform, it’s a catalyst for innovation. By providing a scalable, cost-effective, and user-friendly environment for high-performance computing, Tachyon is transforming how enterprises and researchers approach computational challenges. Whether you’re looking to accelerate your research, reduce costs, or enhance productivity, Tachyon is a powerful tool that can help you achieve your goals.

For organizations ready to harness the full potential of cloud-based HPC, Tachyon offers a future-proof solution that combines cutting-edge technology with ease of use, making high-performance computing accessible to all.

In the next set of blogs in this series, we will delve into each of the key features of the Tachyon platform.

AI Gateway based LLM Routing: Optimizing AI Agent Workflows

invisiblcloud — Tue, 11 Feb 2025 07:45:03 +0000

Introduction

A large number of AI workflows deploy different LLM models to power chatbots, virtual assistants, and enterprise AI solutions. As the demand for accessing multiple models from various providers—both cloud-based and local—continues to rise, the need for an efficient integration mechanism becomes critical. However, managing multiple LLMs efficiently presents significant challenges: cost optimization, response latency, reliability, and vendor lock-in.

To address these concerns, an AI Gateway acts as a middleware that intelligently routes LLM requests, balancing performance, cost, and availability while providing a seamless developer experience. This blog explores why LLM routing via an AI Gateway is crucial, how it integrates with AI agents.

Need for an AI Gateway for LLMs

The following are key drivers for an AI Gateway based access to LLMs for an Agent.

Feature	Where it helps in GenAI agent development
Multi-LLM Orchestration	Many GenAI applications require access to different LLMs (e.g., OpenAI GPT-4, Claude, Mistral, Gemini) for various tasks. A gateway enables seamless switching and unifies interactions.
Performance Optimization	AI applications need low-latency responses. An AI Gateway can intelligently select the fastest LLM available, improving end-user experience.
Reliability & Failover	If an LLM provider experiences downtime or rate limits, the AI Gateway ensures continuity by routing traffic to an alternative model.
Cost Efficiency	Different LLMs have varying pricing. A gateway can dynamically switch between cost-effective models based on budget constraints and usage patterns.
Unified API Management	Instead of managing multiple API integrations, an AI Gateway provides a single API that abstracts different providers, reducing complexity and maintenance overhead.
Caching	* Simple Caching: Store recent LLM responses for repeated queries. * Semantic Caching: Use embeddings to match and return previously generated responses, reducing redundant calls.

AETHER’s AI Gateway

In order to benefit from the above listed advantages, one can try integrating with Aether’s AI Gateway. The agent’s integration with the AI Gateway is very straightforward. The following types of agent integrations are possible.

Integration with Cloud SDK based agents

The following code difference shows how easy it is to integrate an agent with the AI Gateway. It requires passing the AETHER key and the AI Gateway’s end-point.

OpenAI API based LLM access via AI Gateway

Integration with OpenAI API based agents

In this approach, multiple LLMs are supported as they are compliant with the OpenAI API standards, making large LLM model integrations straightforward. The following code difference shows how easy it is to integrate an agent with the AI Gateway. It requires passing the AETHER key and the AI Gateway’s end-point.

Cloud SDK based LLM access via AI Gateway

Provider Switching: Elevated LLM Routing

One of the features unique to LLM Routing inside of AETHER’s AI Gateway is that it can switch Cloud Providers with ease. Say you want to quickly migrate an agent from calling a Gemini model to an AWS Bedrock model, then all you need to do is enable the provider switching configuration in the AI gateway and the requests going earlier to Gemini will now be routed to AWS Bedrock without any change in the code!

AI Gateways – Way forward for LLM Routing

As AI applications scale, LLM routing via an AI Gateway becomes an essential strategy to optimize performance, cost, and reliability. With advanced features like dynamic routing, caching, and observability, AI Gateways ensure seamless LLM integrations and efficient agent workflows.

By adopting an AI Gateway, organizations can:

Reduce operational complexity.
Optimize LLM costs.
Improve reliability and response times.

What’s Next?

Interested in implementing an AI Gateway for your LLM applications? Stay tuned for a deep dive into code snippets, API integrations, and best practices in our upcoming posts!

To know more about Aether please visit here

This blog is authored by VijayRam Harinathan

Securely Accessing Cutting-Edge LLMs with Aether’s AI Gateway & Guardrails

invisiblcloud — Wed, 05 Feb 2025 05:11:25 +0000

Approximately a week ago, DeepSeek, introduced DeepSeek-R1, an impressive model that achieves benchmark performance comparable to OpenAI’s o1.

DeepSeek R1 has rapidly gained popularity among developers due to several compelling factors, notably, it was released with open weights under the permissive MIT license. And its cost is significantly lesser when compared to OpenAIs o1 model usage as you can see below.

Model	1M input tokens	1M output tokens
Deep Seek R1 [deepseek – reasoner] (pricing link)	$0.55	$2.19
Open AI o1 (pricing link)	$15.00	$60.00

All this has generated great interest in evaluating the model. However, deploying and using them without robust security measures can expose organizations to serious risks. Unrestricted access to LLMs can lead to data leaks, compliance violations, and prompt injection attacks.

Aether AI Guardrails – Enforce Responsible AI Usage

Security research has already demonstrated that DeepSeek is susceptible to various jailbreaking techniques—from simple linguistic manipulations to advanced AI-generated prompts (source).

Aether – Governance Policies

With Aether’s AI Gateway, enterprises can securely access these models while enforcing real-time Guardrails to prevent misuse. Our AI Guardrails continuously vet both user inputs and model outputs, automatically detecting and mitigating prompt injections, harmful content, and compliance risks.

Aether Model Access – Fine-Grained Control Over AI Deployment

Not every team or user should have unrestricted access to powerful LLMs. Aether enables platform administrators to define and enforce model access policies—ensuring that only authorized AI teams and users can utilize specific models. Aether’s AI Gateway is configured with the project team and their model access inputs to ensure only authorized users can invoke the model.

Aether – AI Gateway

Aether Proactively Detecting and Mitigating Hallucinations

While DeepSeek’s advancements in model training efficiency are commendable, they do not address one of the most persistent challenges in AI—hallucinations. Like many LLMs, DeepSeek’s chatbot can generate fabricated or misleading responses, a critical risk for enterprises relying on AI for decision-making, compliance, and user interactions. (source)

Aether – Hallucination

Aether goes beyond passive monitoring by actively detecting and mitigating hallucinations in real time. Using context validation, fact-checking mechanisms, and adaptive reinforcement techniques, Aether ensures that AI-generated responses align with verifiable sources and enterprise-defined truth boundaries. This prevents misinformation, enhances trust, and ensures AI outputs remain reliable—a necessity for organizations deploying LLMs in mission-critical environments.

We believe in accelerating innovation while maintaining security and governance. With Aether, you can confidently leverage the latest AI advancements—without compromising on safety, compliance, or control.

To know more about Aether please visit here

This blog is authored by VijayRam Harinathan

Storage Options for HPC workloads on Cloud

invisiblcloud — Fri, 06 Sep 2024 12:25:09 +0000

We are running a HPC blog series to share our experience in building large scale HPC systems on Cloud for large Enterprise customers. Through this we share some of the solutions and the best practices we used while designing a HPC system on Cloud.

Previously, we discussed HPC Capacity Planning on Cloud, Architecting SLURM HPC for Enterprise and Remote Workstations for HPC on Cloud.

This article is the 4^th in the series that discusses the challenges and choice of storage options used for HPC workloads.

Storage

The main factors to be considered for setting up storage for HPC(High Performance Computing) systems are performance, scalability, durability of data and cost.

We have discussed here our approach to designing a storage solution for HPC workloads on AWS.

Here are some of the key factors considered while choosing a storage service and designing a solution.

Understanding the workload

Gather the size of data that is generated by the HPC applications for input, processing and output.
How frequently is the data accessed from the storage.
Identify if heavy data is written or read frequently.

Choice of storage

There are multiple storage options available on AWS. We will look at some of the file storage services.

Amazon EFS (Elastic File System): Suitable for workloads that require shared access to files. EFS is a fully managed NFS (Network File System) service that can scale automatically based on demand.
Amazon FSx for Lustre: If your HPC workloads require high-performance parallel file systems, FSx for Lustre can provide low-latency access to data for compute-intensive applications.
Amazon FSx for Windows File Server: If you’re running Windows-based HPC workloads, FSx for Windows File Server provides fully managed Windows file shares.
Amazon FSx for NetApp OnTap: If your HPC system requires flexibility in accessing data from both Windows and Linux systems, a scalable, high performance, flexible and cost-effective storage solution.

Performance

For achieving the best performance for simulations and rendering based HPC workloads, consider using SSD backed storage volumes using EFS or FSx Lustre with SSD or FSx for OnTap.

Scalability

Design the storage architecture to scale seamlessly as your HPC workload grows. AWS EFS or FSx services, which can automatically scale capacity and performance based on demand.
Implement sharding or partitioning strategies to distribute data across multiple storage resources for better performance and scalability.

Durable Data

Implement appropriate data protection mechanisms such as data replication, snapshots, and backups to ensure data durability and availability.

Security

Configure access controls and encryption to secure your data both in transit and at rest.

Monitoring

Continuously monitor the performance of your storage infrastructure using AWS CloudWatch metrics and other monitoring tools.

Performance Tuning

Use performance tuning techniques such as adjusting I/O sizes, optimising file system parameters, and implementing caching mechanisms to improve storage performance.

Cost Optimization

Consider data tiering and data lifecycle management policies to optimise cost.
Leverage AWS Cost Explorer and AWS Budgets to monitor and manage your storage costs effectively.

For the HPC workloads, for many customers we recommended FSx for NetApp OnTap if they needed flexibility, performance, scalability and cost-effectiveness. We will discuss the rationale behind the choice in the next section.

Why FSx for NetApp ONTAP?

AWS FSx for NetApp ONTAP provides several advantages for HPC (High-Performance Computing) workloads.

Enterprise-Grade Features

NetApp ONTAP brings enterprise-class data management capabilities to AWS, including advanced data deduplication, compression, thin provisioning, snapshots, and data replication. These features are crucial for managing large datasets efficiently in HPC environments.

High Performance

FSx for NetApp ONTAP offers high-performance block storage optimised for HPC workloads. It provides low-latency access to data, high throughput, and low variability in performance, making it suitable for demanding compute-intensive applications like simulations, modelling, and analytics.

File System Flexibility(Windows and Linux mounting)

NetApp ONTAP supports both NAS (Network Attached Storage) and SAN (Storage Area Network) protocols, providing flexibility in accessing data for different types of HPC workloads. It allows seamless integration with existing NFS (Network File System) and SMB (Server Message Block) environments, enabling easy migration of HPC applications to AWS.

Data Management and Protection

NetApp ONTAP offers robust data management and protection capabilities, including snapshots, clones, data replication, and data encryption. These features ensure data integrity, availability, and security, which are critical requirements for HPC workloads.

Scalability

FSx for NetApp ONTAP can scale both capacity and performance dynamically to accommodate growing HPC workloads. It supports incremental capacity expansion without disrupting operations, allowing organisations to scale their storage infrastructure seamlessly as their computational needs evolve.

Integration with AWS Services

Being a native AWS service, FSx for NetApp ONTAP integrates seamlessly with other AWS services and features, such as AWS Direct Connect, AWS CloudFormation, AWS Backup, and AWS IAM (Identity and Access Management). This enables organisations to leverage the full power of the AWS ecosystem for their HPC workflows.

Cost-Effective

FSx for NetApp ONTAP offers a pay-as-you-go pricing model, allowing organisations to align storage costs with actual usage. It eliminates the need for upfront hardware investments and provides predictable pricing, making it a cost-effective solution for HPC workloads on AWS.

Architecture

This section discusses the high level architecture of FSx for OnTap and also shows how it fits into the overall HPC architecture

The diagram shows the key components of the FSx for OnTap storage setup.

HPC Storage Architecture

File System

The file system serves as the central resource in FSx for ONTAP, similar to an on-premises NetApp ONTAP cluster. NetApp provides CLI commands using which the connection can be established and can be used to manage and troubleshoot the file system.

Storage Virtual Machine (SVM)

A Storage Virtual Machine (SVM) functions as an independent virtual file server that provides management and data access endpoints. The coordination between FSx for ONTAP and an Active Directory domain happens at the SVM level. When there are Active Directory-related errors, the admin can troubleshoot at the SVM level.

Volumes

Volumes are the virtual layer that is used to organise the data. They act as the containers where the data resides using the physical storage within the file system. The volumes are built inside the SVM (Storage Virtual Machines). The volumes can be configured with tiering policies thus optimising both performance and cost.

FSx for NetApp OnTap in HPC Architecture

The following diagram shows how the FSx for Netapp OnTap storage is integrated with the HPC cluster and the virtual workstations in the HPC architecture.

The storage volumes are provisioned on OnTap and are mounted on the HPC cluster nodes and workstations.

FSx for NetApp OnTap in HPC Architecture

The “/data” and “/apps” volumes provisioned within FSx for NetApp ONTAP are strategically designed to accommodate the unique demands of HPC (High-Performance Computing) workloads:

/data Volume

The “/data” volume serves as a dedicated repository for storing essential data sets and resources integral to HPC workflows.
It provides a centralised location for housing various data types, including simulation inputs, output results, research datasets, and project-specific files.
HPC administrators typically allocate this volume to facilitate efficient data access and management for computational tasks, ensuring seamless execution of HPC workload

/apps Volume

The “/apps” volume is specifically configured to cater to the storage requirements of HPC applications and associated resources.
It offers a specialised environment optimised for hosting application binaries, libraries, dependencies, and configuration files necessary for HPC software stacks.
This volume plays a crucial role in facilitating the deployment, execution, and scalability of HPC applications, providing a reliable storage platform for critical application components.

User Level Storage Quota

In an HPC (High-Performance Computing) environment leveraging FSx for NetApp ONTAP, user-level storage quotas are allocated to individual users to govern their access to storage resources at a granular level.

Each user is identified and authenticated within the HPC system using unique user accounts managed by the system or integrated identity providers Active Directory.
Storage quotas are assigned to individual users based on their specific needs, roles, or project affiliations.
FSx for NetApp ONTAP offers features to enforce user-level storage quotas at the file system level.
The utilisation is monitored and the administrator can ensure the usage is under the limit and not over utilised.
Quotas can be changed over a period of time depending on the user requirements.

Multi Region

While FSx for NetApp ONTAP does not offer native multi-region support, implementing data replication between instances in different regions allows you to achieve multi-region redundancy and disaster recovery for your file storage.

Provision FSx for NetApp ONTAP instances in the AWS regions where redundancy and disaster recovery are needed.
NetApp ONTAP offers SnapMirror feature to replicate data between FSx for NetApp ONTAP instances in different regions.
Set up SnapMirror relationships to replicate data synchronously or asynchronously between source and destination FSx for NetApp ONTAP volumes.
Use VPC peering, VPN connections, or AWS Direct Connect to facilitate communication between regions.
Regularly review replication metrics and perform maintenance tasks such as updating replication policies as needed.
Create a disaster recovery plan for failover and failback in the event of region-wide outages or data loss incidents.

AD Authentication

Many enterprise customers use AD(Active Directory) based Identity management systems for authenticating and securing access to their IT infrastructure. The AD environment can be integrated with your cloud infrastructure and with the storage system.

To enable Active Directory (AD) authentication for FSx for NetApp ONTAP, you need to integrate your FSx for NetApp ONTAP environment with your existing Active Directory infrastructure.

You need appropriate permissions in your Active Directory environment to create computer objects and join them to the domain. This is configured through a service account resource.
Configure DNS resolution in your network environment to ensure that the FSx for NetApp ONTAP file system’s DNS name can be resolved by domain-joined clients.
From the FSx for NetApp ONTAP management console or CLI, join the FSx for NetApp ONTAP instance to your Active Directory domain.
Provide the necessary Active Directory domain information, including domain name, organisational unit (OU), and credentials with permissions to join computers to the domain.
Once FSx for NetApp ONTAP is joined to the domain, configure Active Directory authentication settings within the ONTAP management interface.
Specify the Active Directory domain and domain controller(s) to use for authentication.
Configure the appropriate LDAP settings, including LDAP servers, base DN (Distinguished Name), and bind credentials.
Ensure that users can authenticate using their Active Directory credentials and access files and directories based on their permissions.

Windows and Linux Permissions (NTFS, SAMBA)

Many of our customers needed the storage to support both Windows and Linux machines. This requirement is satisfied by FSx NetApp OnTap. FSx for NetApp ONTAP provides support for both NTFS (New Technology File System) and SAMBA (SMB/CIFS) compatibility, allowing seamless integration with Windows-based environments as well as Linux and other Unix-like operating systems.

NTFS

FSx for NetApp ONTAP supports the NTFS file system, which is the default file system for Windows operating systems.
Supports file and directory permissions, ACLs, file shares, encryption, compression and symbolic links are fully supported.

SAMBA

SAMBA is the standard protocol used for sharing files and printers between Windows and Unix/Linux systems.
FSx for NetApp ONTAP implements SAMBA compatibility to ensure seamless interoperability with Windows-based clients, Linux systems, and other devices that support the SMB protocol.
Files and directories created or modified by Windows-based clients using NTFS permissions are fully compatible with SAMBA clients, and vice versa.
This seamless interoperability ensures that users can collaborate and share files across diverse operating environments without compatibility issues.

Caching

By leveraging caching mechanisms such as read caching, write caching, metadata caching, and adaptive caching, FSx for NetApp ONTAP enhances the performance of file system operations, reduces latency, and improves overall responsiveness, delivering a high-performance and scalable file storage solution for enterprise workloads.

SnapMirror

For some of our customers we have implemented network storage in multiple regions as the HPC users are spread across multiple regions. The HPC users would want to share the data files with their colleagues in other regions. For the users to access the files across regions, the latency has to be as minimum as possible as the file sizes may range from GBs to TBs.

SnapMirror is a data replication feature in FSx for NetApp ONTAP that enables the asynchronous and synchronous replication of data between ONTAP storage systems. It allows you to create and manage replication to efficiently replicate data across different locations, such as within the same AWS region or across AWS regions, for purposes such as disaster recovery, data migration, and data distribution.

SnapMirror replicates data at the volume level, allowing you to replicate entire file systems or individual volumes between ONTAP storage systems.
It employs techniques such as incremental data transfer and block-level change tracking to replicate only the changed data blocks since the last replication cycle, reducing the amount of data transferred over the network and the time required for replication.
SnapMirror supports various replication topologies, including cascade, fan-out, and multi-directional replication, allowing you to replicate data between multiple ONTAP storage systems in complex deployment scenarios.
You can configure one-to-one, one-to-many, or many-to-one replication relationships to meet your specific replication requirements.
You can configure policies to specify replication intervals, retention policies, and other replication settings.
You can monitor replication status, track replication lag, and view replication performance metrics using the ONTAP management interface or command-line interface (CLI).

Data Migration

Most enterprises that start with HPC systems on-premises will have their data files in shared storage systems within their corporate network. When migrating the HPC workloads to the cloud, the data files need to be migrated to the cloud based network storage as well. As the HPC data files use TBs of capacity, we have recommended our customers to use DataSync to transfer the files.

AWS DataSync is a data transfer service designed to streamline, automate, and expedite the process of moving and replicating data between on-premises storage systems and AWS storage services via the internet or AWS Direct Connect. With DataSync, transferring your file system data along with associated metadata, including ownership, timestamps, and access permissions, becomes effortless and efficient.

You can use DataSync to transfer files between two FSx for ONTAP file systems, and also move data to a file system in a different AWS Region or AWS account.

Transferring files from a source to a destination using DataSync involves the following basic steps:

Download and deploy an agent in your environment and activate it (not required if transferring between AWS services).
Create a source and destination location.
Create a task.
Run the task to transfer files from the source to the destination.

Backup and Restore

Many times the data files can get deleted by human error or the file system issue in a particular zone or region. In order to ensure the resilience and reliability of the data stored in the file system, it is necessary to backup the files periodically.

There are multiple features and methods available within the Netapp OnTap for backup and restore.

Snapshot-Based Backup and Restore

NetApp ONTAP supports snapshot-based backup and restore operations, allowing you to create point-in-time snapshots of your file system data and restore data from snapshots when needed.
To perform a backup, you can create snapshots of your file systems using the NetApp ONTAP management interface or command-line interface (CLI). Snapshots capture the state of your file system at a specific point in time.
To restore data from a snapshot, you can use the snapshot restore feature to roll back your file system to a previous snapshot, effectively reverting changes made since the snapshot was taken.

SnapMirror Replication

We already discussed SnapMirror based replication. SnapMirror, a data replication feature in NetApp ONTAP, enables asynchronous and synchronous replication of data between ONTAP storage systems.

Backup to Cloud Storage

You can configure NetApp ONTAP to back up your file system data to Amazon S3 buckets using features like Cloud Backup, which allows you to create backups of your file system data in S3 for long-term retention and archival.
To restore data from a backup stored in Amazon S3, you can use NetApp ONTAP to retrieve the backup data from S3 and restore it to your file system.

Conclusion

AWS provides multiple storage solutions and particularly high performance storage options through FSx storage service. By following the key design considerations and the best practices mentioned in this article, the HPC systems can benefit from the high performance storage solutions. The storage solutions also are easy to migrate to and the data migration can be done using standard tools. All the important goals like performance, scalability, durability, security, optimisation, backup & restore and cost effectiveness can be achieved at enterprise scale.

Remote Workstation for HPC workloads on Cloud

invisiblcloud — Fri, 02 Aug 2024 08:39:53 +0000

Previously, we discussed HPC Capacity Planning on Cloud and Architecting SLURM HPC for Enterprise.

This article is the 3^rd in the series that discusses the challenges and choice of workstations used for HPC workloads.

Why Remote Workstations?

We have seen in our previous blogs on how to design an enterprise grade HPC system on cloud. Any research user who wants to run experiments such as CFD(Computational Fluid Dynamics), Computational Biology, Financial risk analysis, seismic imaging, genomics research or basic scientific research can get access to the HPC systems on the cloud and run those experiments.

The experiments consist of multiple steps like pre-processing, simulations and post-processing. Traditionally in on-premises the R&D users would run the modelling steps using their over the desk workstations and run the simulations on the HPC clusters. Sometimes, they would run smaller simulations on their workstations as well. Finally they would also visualise the results on the workstations at their desk. This requirement of the users forces the enterprise to buy expensive workstations and requires a lot of upfront investment, maintenance cost and other overheads.

Moving to remote workstations on the cloud brings a lot of advantages in terms of cost, choice of capacity in CPUs and GPUs, flexibility, reliability and availability of resources. The users do not need to have powerful desktop workstations anymore. All the users need is just a regular laptop or workstation to connect to the special purpose remote workstations and perform their experiments.

Workstations on AWS

A common practice on AWS is to use Amazon EC2 instances as workstations. They bring many benefits out of the box. Amazon EC2 offers a wide range of instance types with varying CPU, memory, storage, and GPU capabilities. This allows us to choose the instance type that best suits your specific HPC workload requirements, whether it is for pre-processing and visualisation of results or even running a small scale simulation.

Provides full control over the operating system and the configurations of the EC2 instances. Although both Windows and Linux based instances are available, in most cases the HPC users would require a Windows based workstation to run their HPC applications like ANSYS, COMSOL, MATLAB, PowerFlow tools etc.

The instances can be provisioned in no time, can be stopped and restarted as per the need and also can be terminated if they are no longer needed. Based on the number of R&D users the EC2 workstations can be launched and terminated. This allows optimisation of cost by scaling instances as needed.

The R&D users can connect to the EC2 Windows workstations using remote desktop protocol (RDP). AWS also provides a better remote visualisation option called NiceDCV which uses the DCV protocol. We will see more details about NiceDCV in the following sections.

The remote EC2 workstation can be set up on the same VPC as the HPC cluster nodes and can connect to the cluster in order to submit jobs.

How does the Workstation fit into HPC architecture?

As we discussed above the EC2 workstation can be launched and configured to connect to the HPC cluster. The R&D users from the enterprise network can connect to the AWS EC2 workstation instances using RDP or NiceDCV protocols. The following architecture diagram shows a high level view of how the EC2 workstation, HPC cluster and the users connect with each other.

Fig1: HPC Workstations Architecture

AWS recommends multiple instance types that are suitable for remote workstations like Amazon EC2 G3, G4dn, G4ad, or G5 instance types. These instance types offer GPUs that support hardware-based OpenGL and GPU sharing. For more information, see Amazon EC2 G3 Instances, Amazon EC2 G4 instances, and Amazon EC2 G5 Instances.

The selection of instance type for virtual workstation was based on the following factors

HPC users would connect from their desktops or laptops on premise to the remote workstations using NiceDCV clients and hence the remote workstation needs to run the NiceDCV server.
Multiple users can share sessions on the remote workstation to collaborate on the HPC research.
Hyper-Threading
Choose a GPU instance type based on the specific requirements of your HPC workload. AWS offers instances with NVIDIA GPUs optimised for various tasks, including general-purpose computing, machine learning and graphics rendering.
Number of GPUs and Memory
The creation of the 3D geometry using ANSYS Workbench.
Visualisation of the results.
Network bandwidth to transfer input and output files between the workstation and the storage.
Cloud regional availability.
Cost and Performance.

The hardware requirement of the workstation is as follows,

GPU	vCPUs	Memory	Storage
1	32	256 GB	500 GB

We compared the features of G4dn/P4d and P3 instance types which are recommended by AWS for HPC use cases.

Instance	vCPU/GPU	Memory	Networking	Intel AVX2 support	Regional Availability	Cost
G4dn.8xlarge	32/1	128	Enhanced Networking(50 Gbps)	Yes	eu-west-1	X
P3.8xlarge	32/4	244	Enhanced Networking(10 Gbps)	Yes	eu-west-1	4X of G4dn
P2.8xlarge	32/8	488	Enhanced Networking(10 Gbps)	Yes	eu-west-1	2.5x of G4dn

We tested the geometry step in all the instance types and found that the performance was acceptable in all of them. But the major differentiator was the cost of the G4dn.8xlarge instance. Similarly, the visualisation of the results matched the expected performance and the G4dn.8xlarge instance type was chosen as the standard for all virtual workstations.

Why NiceDCV?

NICE DCV is a high-performance remote display protocol that provides HPC users with a secure way to deliver remote desktops and application streaming from any cloud or data center to any device, over varying network conditions. With NICE DCV and Amazon EC2, HPC users can run graphics-intensive applications remotely on EC2 instances, and stream their user interface to simpler client machines (thin clients), eliminating the need for expensive dedicated workstations. HPC users across a broad range of HPC workloads use NICE DCV for their remote visualisation requirements.

The following are the key benefits of using NICE DCV as a remote visualisation solution for HPC environments.

GPU-Accelerated Performance

NiceDCV harnesses GPU acceleration to deliver high-performance graphics rendering and visualisation. This enables smooth interaction with complex 3D models, simulations, and virtual environments.

Multi-User Scalability

NiceDCV efficiently scales to support multiple users accessing and interacting with remote applications simultaneously. It ensures that each user receives optimal performance and responsiveness, even during peak usage periods.

Cross-Platform Compatibility

NiceDCV is compatible with a wide range of client devices and operating systems, including Windows, macOS, Linux, and mobile platforms. This versatility allows users to access their applications from various devices and environments.

Adaptive Streaming Technology

NiceDCV incorporates adaptive streaming technology to dynamically adjust streaming quality based on network conditions and client device capabilities. This ensures a consistent user experience, even in challenging network environments with variable bandwidth.

Client-Side Rendering Optimization

NiceDCV optimises rendering tasks by offloading certain processing tasks to the client device when feasible. This reduces the workload on the server side and improves overall performance, particularly in scenarios with limited server resources.

Security Features

NiceDCV prioritises security and implements encryption and authentication mechanisms to protect data in transit and ensure secure remote access. It integrates seamlessly with AWS security services and features, providing a secure environment for sensitive workloads.

Specialized Workload Support

NiceDCV is optimised for a wide range of specialised workloads, including engineering simulations, scientific visualisation, automobile design and many other High Performance Computing workloads. It provides the performance and flexibility needed to support these demanding applications effectively.

In the next section we will discuss how to setup workstation image and automate the launching of workstation.

Workstation with NiceDCV vs AWS Workspace

We have seen the features and advantages of the NiceDCV in the previous section.

AWS WorkSpaces is a fully managed desktop-as-a-service (DaaS) solution. It provides users with a cloud-based virtual desktop environment accessible from anywhere, using any supported device.
WorkSpaces offers a standard desktop experience with access to commonly used productivity tools like Microsoft Office, web browsers, email clients, etc.
While WorkSpaces does support basic graphics capabilities, it’s not optimised for high-performance visualisation tasks like those required in CAE or scientific simulations.
WorkSpaces is ideal for scenarios where users need a general-purpose virtual desktop environment for everyday tasks, remote work, collaboration, and accessing corporate applications.
In summary, NiceDCV is tailored for demanding graphics applications that require high-performance visualisation capabilities, while AWS WorkSpaces is a more general-purpose virtual desktop solution suitable for a wide range of business use cases.

How to setup a Workstation image?

Setting up EC2 Windows instances with NiceDCV involves several steps to configure both the EC2 instance and the NiceDCV server.

Launch EC2 Windows Instance

Log in to the AWS Management Console and navigate to the EC2 dashboard.
Click on “Launch Instance” to start the instance creation wizard.
Choose a Windows Server AMI (Amazon Machine Image) with appropriate Windows version.
Select an instance type with sufficient CPU, memory, and optionally GPU resources for your workload.
Configure instance details such as network settings, storage, and security groups.
Add any additional configurations or user data scripts as needed.
Review and launch the instance.

Configure NiceDCV on EC2 Instance

Once the instance is running, connect to it using Remote Desktop Protocol (RDP).
Download and install the NiceDCV server software on the Windows instance.
Configure NiceDCV settings, including authentication, display resolution, and network settings, according to your requirements.
Ensure that the necessary firewall rules and security group settings allow inbound connections to the NiceDCV server port (default is TCP port 8443).

Join EC2 Instances to the Active Directory Domain

Once the EC2 instances are running, connect to each instance using Remote Desktop Protocol (RDP).
Open the Server Manager, navigate to “Local Server” settings, and click on “Workgroup” to change the system settings.
Click on “Change” and enter the name of your Active Directory domain. You’ll be prompted to provide domain administrator credentials.
After joining the domain, restart the EC2 instances for the changes to take effect.

Configure AD Authentication on EC2 Instances

After joining the domain, you can configure AD authentication for user access to the EC2 instances.
Log in to each EC2 instance using domain user credentials to verify that AD authentication is working correctly.
You can now manage user access and permissions on the EC2 instances using Active Directory group policies and permissions.

Connect to NiceDCV Session

Once NiceDCV is configured on the EC2 instance, you can connect to it using a NiceDCV client application installed on your local machine.
Launch the NiceDCV client and enter the public IP address or DNS name of your EC2 instance.
Enter the AD authentication credentials and connect to the NiceDCV session.

Optimise NiceDCV Performance

Fine-tune NiceDCV settings to optimise performance based on your specific workload requirements.
Adjust settings such as image quality, frame rate, and compression level to achieve the desired balance between performance and visual fidelity.
Consider leveraging GPU resources for graphics-intensive workloads by installing GPU drivers and configuring NiceDCV to utilise GPU acceleration.

Test and Validate

Test the NiceDCV session by running your applications or workloads on the EC2 instance.
Validate performance, responsiveness, and visual quality to ensure that NiceDCV meets your expectations.
Monitor resource utilisation on the EC2 instance to identify any potential bottlenecks or performance issues.

How to automate the Workstation deployment?

Once the workstation setup is validated, we automated the deployment of EC2 instances with NiceDCV using AWS CloudFormation.

AWSTemplateFormatVersion: ‘2010-09-09’
Description: ‘CloudFormation template for EC2 Windows instance with NiceDCV’

Resources:
EC2Instance:
Type: AWS::EC2::Instance
Properties:
InstanceType: g4dn.2xlarge
ImageId: ami-xxxxxxxxxxxxxxxxx # Specify a Windows Server AMI
KeyName: MyKeyPair
SecurityGroupIds:
– !Ref InstanceSecurityGroup
UserData:
Fn::Base64: !Sub |

# Install NiceDCV prerequisites
Install-WindowsFeature -Name Server-Media-Foundation -IncludeAllSubFeature -IncludeManagementTools
# Download NiceDCV installer
$url = “https://d1uj6qtbmh3dt5.cloudfront.net/NiceDCV-2021.2->-Setup.exe”#
Replace with x86 or x64
$output = “C:\NiceDCV-Setup.exe”
Invoke-WebRequest -Uri $url -OutFile $output
# Install NiceDCV
Start-Process -FilePath $output -ArgumentList “/S” -Wait

Tags:
– Key: Name
Value: EC2WithNiceDCV

InstanceSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Allow inbound RDP and NiceDCV traffic
SecurityGroupIngress:
– IpProtocol: tcp
FromPort: 3389
ToPort: 3389
CidrIp: 0.0.0.0/0 # Allow RDP from anywhere (for demonstration purposes)
– IpProtocol: tcp
FromPort: 8443
ToPort: 8443
CidrIp: 0.0.0.0/0 # Allow NiceDCV traffic from anywhere (for demonstration purposes)

CFT templates can be used to launch new workstations for R&D HPC users. The workstation IP can be shared to the user for whom it is assigned.

How to connect to the remote Workstation?

This section describes the steps to connect to the remote workstation using Nice DCV.

Two options are available

Connection via Nice DCV client app
Connection via a web browser.

The recommended option (described first) is to connect via Nice DCV client app.

Nice DCV Session via Client App

Open NICE DCV Client
Enter the IP address of the virtual workstation
Click Trust and Connect, on the Server Identity check page
Enter the Username and Password and click Login
- Username: \username
Wait for the connection Allow the Nice DCV client to establish a connection with the remote server. This may take a moment, depending on your internet connection and the server’s performance.

Nice DCV Session via Web Browser If you cannot install the Nice DCV client, you can use the web browser option.

Launch your preferred web browser on your machine.
Navigate to the URL as mentioned below In the address bar of the web browser, enter the web address provided by your system administrator. It typically looks like [https://:8443].
Click the “Advanced” button on the non-secure connection warning page.
Click Proceed to IP address (unsafe).
You will be directed to the Nice DCV web portal. Log in with your credentials, including your username and password. E.g. \username
Click sign in.
Allow the Nice DCV client to establish a connection with the remote server. This may take a moment, depending on your internet connection and the server’s performance.

In the next set of sections we will look at some of the key features that are useful for HPC teams.

Multiple Users Session Sharing

NICE DCV users can collaborate on the same session, enabling screen and mouse sharing. Users can join authorised sessions while session owners can disconnect users from any session collaboration. To take advantage of this feature, users must be joining the same session identified by the same session ID. By default, the only user that can connect to a NICE DCV session is the owner of that session. For other users to collaborate on the same session, the active permissions applied to the session need to be updated to include the display parameter. For more information on editing the permissions file, see Configuring NICE DCV authorization.

Workstation Lifecycle & Automation

Enterprises can save cost on remote workstations launched by R&D users by making sure any unused workstations are stopped or terminated depending on the usage patterns. Although it looks like a simple process, it can save a lot of cost over long periods of time considering multiple users running multiple workstations.

The AWS Instance Scheduler is a solution that enables you to automatically start and stop Amazon EC2 instances on a schedule. This helps you save costs by ensuring that instances are only running when they are needed.

The Instance Scheduler allows you to define start and stop schedules for your EC2 instances based on your organisation’s usage patterns and requirements.
You can schedule instances to start, stop, or both at specific times of the day, week, or month, depending on your workload and operational needs.
You can configure the Instance Scheduler using AWS CloudFormation templates provided by AWS. These templates create the necessary AWS resources, such as AWS Lambda functions, Amazon CloudWatch Events rules, and Amazon DynamoDB tables, to manage instance schedules.
The Instance Scheduler uses tags on EC2 instances to determine which instances to include in the scheduling process. You can specify tags to identify instances and associate them with specific schedules.
It uses Events rules to trigger Lambda functions based on predefined schedules, which then perform actions such as starting or stopping instances.

AWSTemplateFormatVersion: ‘2010-09-09’
Description: ‘AWS Instance Scheduler’

Resources:
InstanceSchedulerLambdaFunction:
Type: ‘AWS::Lambda::Function’
Properties:
Handler: ‘index.handler’
Role: !GetAtt InstanceSchedulerLambdaRole.Arn
Runtime: ‘python3.8’
Code:
ZipFile: |
import boto3
import os
def handler(event, context):
# Initialize AWS clients
ec2 = boto3.client(‘ec2’)

# Retrieve EC2 instance IDs based on tags
instances = ec2.describe_instances(Filters=[{‘Name’: ‘tag:InstanceScheduler’, ‘Values’: [‘true’]}])

# Start or stop instances based on schedule
for reservation in instances[‘Reservations’]:
for instance in reservation[‘Instances’]:
instance_id = instance[‘InstanceId’]
action = event[‘action’]
if action == ‘start’:
ec2.start_instances(InstanceIds=[instance_id])
elif action == ‘stop’:
ec2.stop_instances(InstanceIds=[instance_id])

InstanceSchedulerLambdaRole:
Type: ‘AWS::IAM::Role’
Properties:
AssumeRolePolicyDocument:
Version: ‘2012-10-17’
Statement:
– Effect: Allow
Principal:
Service: ‘lambda.amazonaws.com’
Action: ‘sts:AssumeRole’
Policies:
– PolicyName: ‘EC2InstanceSchedulerPolicy’
PolicyDocument:
Version: ‘2012-10-17’
Statement:
– Effect: Allow
Action:
– ‘ec2:DescribeInstances’
– ‘ec2:StartInstances’
– ‘ec2:StopInstances’
Resource: ‘*’

InstanceSchedulerEventRuleStart:
Type: ‘AWS::Events::Rule’
Properties:
Description: ‘Schedule EC2 instance start’
ScheduleExpression: ‘cron(0 8 * * ? *)’ # Start instances at 8:00 AM UTC daily
State: ‘ENABLED’
Targets:
– Arn: !GetAtt InstanceSchedulerLambdaFunction.Arn
Id: ‘InstanceSchedulerStart’
EventBusName: ‘default’
InstanceSchedulerEventRuleStop:

The above descriptions of the Workstation lifecycle management is based on AWS native services.

We have implemented workstation lifecycle management as part of our HPC self-service platform called Tachyon. We will be sharing more information on the product in a series of upcoming articles.

Workstation Modifications

In order to make the remote workstations enterprise grade secure workstations, we recommend changing some of the configurations as mentioned here.

Install necessary software like productivity tools and development environments.
Configure graphics drivers and install NiceDCV for remote visualisation.
Ensure security with firewall settings, user permissions, and antivirus software.
Set up networking, RDP for remote access, and performance optimization.
Implement backup solutions, monitoring tools, and document configurations.

Workstation Usage / Utilisation Patterns

There are a couple of ways to track usage and utilisation of your EC2 workstation for HPC workloads

Using Amazon CloudWatch

CloudWatch is a free monitoring service offered by AWS that provides detailed insights into your EC2 instances.

Monitor CPU and GPU Utilisation

CloudWatch allows you to monitor CPU utilisation (like CPUUtilization metric) and GPU utilisation (depending on the GPU type, specific metrics may vary). This helps identify if your workloads are maxing out the resources.

Track Network Traffic

Monitor network traffic metrics (like NetworkIn and NetworkOut) to understand data transfer between the workstation and external sources.

Disk Usage

CloudWatch offers metrics for disk space usage to identify if your storage is nearing capacity.

Set up Alarms

Define CloudWatch alarms based on these metrics. For example, set an alarm to trigger if CPU utilisation goes above a certain threshold, indicating potential bottlenecks.

We have implemented workstation observability that can track CPU, memory, network traffic, disk usage and alerts as part of our HPC self-service platform called Tachyon. We will be sharing more information on the product in a series of upcoming articles.

Workstation CPU / GPU Considerations

Choosing the right CPU and GPU for your EC2 workstation for HPC workloads depends on the specific needs of your applications.

CPU

vCPUs (virtual CPUs): HPC workloads often benefit from high core counts for parallel processing. Look for EC2 instances with a high vCPU count, such as the C6g or R6g instances.
Clock Speed: While core count is important, clock speed also matters. CPUs with faster clock speeds will improve single-threaded performance, which can benefit certain HPC tasks. Consider a balance between vCPUs and clock speed based on your workload.

GPU

Memory: HPC workloads involving large datasets can benefit from GPUs with ample memory. Instances like P4d with NVIDIA A100 GPUs offer significant memory for complex computations.
Processing Power: Different GPU architectures are suitable for different types of workloads. HPC workloads involve graphics-intensive applications, G4dn instances with NVIDIA Tesla GPUs are more suitable.

Additional Considerations

Cost: GPU instances can be expensive. There are other cost-effective options like G5g instances with Graviton2 processors and NVIDIA T4G GPUs for workloads that don’t require the top-tier performance of G4dn instances.
Storage: Depending on your data size, choose an instance with sufficient storage capacity or consider attaching EBS volumes for additional storage. Usually a NFS based storage is preferred for long term storage of HPC workload input and output files. AWS FSx provides multiple options like FSx for Lustre and FSx for NetApp OnTap storage solutions.

Workstation Authentication

Users can use AD credentials to login into the remote workstation while connecting either through RDP or NiceDCV. The details of how AD integration can be configured is given as part of the image creation section above. This section summarises the key steps involved.

Here are the important configurations to enable AD authentication.

Join EC2 Instance to the Domain

Connect to the EC2 Windows instance using Remote Desktop Protocol (RDP).
Open “System Properties” by right-clicking on “This PC” and selecting “Properties.”
Click on “Change settings” under “Computer name, domain, and workgroup settings.”
Select “Change” and enter the domain name. Provide domain admin credentials when prompted.
Restart the instance to apply changes.

Configure AD Authentication

Once the instance is joined to the domain, users can log in using their Active Directory credentials.
Ensure that domain users have the necessary permissions and access rights on the instance.

Security Group Configuration

Adjust security group settings to allow communication with the Active Directory domain controllers on the required ports (e.g., TCP/UDP 389 for LDAP).
Ensure that necessary ports for AD authentication are open in the Windows Firewall.

DNS Configuration

Ensure that the EC2 instance’s DNS settings are configured to point to the Active Directory domain controllers.
Update the DNS server settings in the instance’s network adapter properties.

After all the configuration is done, connect to the remote workstation from your laptop/desktop using RDP or NiceDCV and test the AD authentication by logging in with domain user credentials.

Workstation Costing

Here’s a comparison of EC2 G4dn, p3, and p2 instance types focusing on GPUs and costs

GPU

G4dn: Features NVIDIA Tesla T4 GPUs with 16 GB of GDDR6 memory. These GPUs are well-suited for machine learning inference and cost-effective small-scale training.
P3: Uses NVIDIA Tesla P100 or P4d with NVIDIA A100 GPUs. Memory varies depending on the specific instance size (ranging from 16 GB to 40 GB). P3 instances offer higher performance compared to G4dn for various workloads including deep learning training and inference.
P2: Equipped with older generation NVIDIA Tesla K80 GPUs with 12 GB of GDDR5 memory. P2 instances are the least expensive option among the three but offer the lowest performance.

Cost

G4dn: Generally the most cost-effective option, especially for workloads that don’t require top-tier performance.
P3: More expensive than G4dn due to the more powerful GPUs.

P2: The least expensive option but may not be cost-effective if your workload requires significant processing power.
- Use the AWS Pricing Calculator https://aws.amazon.com/ec2/pricing/ to get the latest on-demand pricing for different instance types and regions.
- Consider exploring AWS reserved instances or Savings Plans for significant cost savings if you plan to use the EC2 workstation for extended periods.

Workstation Catalog

Catalog of workstations can be created with different EC2 AMIs that are tailor made for specific HPC applications for specific use cases. As part of Tachyon which is our self-service platform provides a workstation catalog feature that can be used by HPC users to dynamically request to launch new workstations. The users can use and manage the workstations using the Tachyon platform.

Conclusion

AWS EC2 service provides many choices of instance types that are suitable for remote workstations that can be used for HPC CAE workloads. By following all the key design considerations and best practices mentioned in this article, HPC systems can provide flexibility, scalability, availability to the R&D users. They can also provision and manage remote workstations in a cost effective manner.