Platform Engineering

To MCP or Not? That is the Question

Alan Shimel — Mon, 16 Mar 2026 11:49:45 +0000

The CORBA of AI, or the Plumbing We Haven’t Finished Yet?

A Dialogue Between Alan “Shimmy” Shimel and Mitch “Shakespeare” Ashley

There is a moment in every technology wave when the room splits into two camps.

One says: This is the future.
The other says: This is overengineered nonsense.

Right now, that moment is happening around the Model Context Protocol (MCP).

Instead of another solo op-ed, what follows is a conversation — not staged, not polite, but grounded in how platform leaders actually argue when the stakes are real.

Shimmy:

Mitch, I’m going to just say it,

MCP is starting to feel like the CORBA of AI.

Not useless. Not stupid. Just too much architecture before we even know what the building looks like.

Everywhere I turn, platform teams are telling me the same thing: Security risk, token overhead, operational pain, weird deployment models. Some vendors are already pivoting to simpler Agent APIs, direct integrations, even CLIs.

Reality is undefeated. And reality is pushing back.

Shakespeare:

Alan, I think that framing misses why MCP spread in the first place.

MCP didn’t go viral because Anthropic ran a great marketing campaign. It spread because every team building with LLMs hit the same wall independently:

How do you reliably connect models and agents to real systems?

Before MCP, everyone had duct tape:

Custom tool wrappers
Proprietary function-calling formats
One-off integration layers
Zero portability

MCP gave the ecosystem a shared language. Adoption was pull, not push. Engineers don’t voluntarily adopt new infrastructure unless it solves real pain.

Shimmy:

Fair. The problem is real. Nobody disputes that. Is this the right solution? This is the question, Shakespeare :-)

Solving a problem once doesn’t mean the solution survives contact with production.

Platform teams aren’t rejecting the idea of agent-to-tool connectivity. They’re rejecting the operational cost of doing it this way.

Dynamic discovery sounds great until security shows up and asks who approved that tool invocation. Or finance shows up and asks why token spend doubled. Or SRE shows up at 3 a.m. because a persistent connection model doesn’t behave like the stateless infrastructure everything else uses.

That’s when elegance stops mattering.

Shakespeare:

Every AI plan is great until the token bill shows up! And that pushback is exactly why MCP shouldn’t be dismissed.

The gaps teams are discovering are not random flaws. They’re a roadmap of what agent infrastructure still lacks.

Security teams are right to worry. Formalizing autonomous tool access across trust boundaries without first solving identity, authorization and provenance absolutely creates a new attack surface.

Prompt injection becomes command execution. Tool poisoning becomes supply chain risk. Dynamic capability discovery becomes a vector if trust isn’t anchored.

FinOps is right to freak out when the piper comes a’ piping to pay the AI bill. Time to optimize and not let AI make those decisions by itself.

But those are solvable problems. They don’t invalidate the need for a standard.

Shimmy:

Solvable, sure. But not trivial.

Right now, most enterprises barely have identity and governance under control for humans. Now we’re introducing non-deterministic machine actors that can discover capabilities on the fly.

That’s not just a protocol problem. That’s an existential governance problem.

So platform teams are doing what they always do: routing around the risk.

LLM generates intent. Deterministic systems execute it. Guardrails everywhere. No surprise autonomy.

Boring architecture wins in production.

Shakespeare:

And yet proprietary fragmentation is worse long-term.

If every platform builds its own agent-to-tool interface, we recreate the pre-web era of incompatible systems. MCP’s value is interoperability — the idea that agents and services can talk without bespoke glue code for every pairing.

But for that to work, MCP has to mature in very specific ways.

It needs cryptographic attestation of tool servers so agents can verify provenance. It needs native observability of decision cycles so governance systems can reconstruct what happened and why. It needs authorization enforcement at the write boundary before actions execute, not logs after the fact.

Right now, most of that lives outside the protocol.

Shimmy:

Which brings us to the real issue.

Platform engineers don’t care where the capability lives. They care whether the system is governable.

Identity. Authorization. Auditability. Observability. Cost control.

If MCP doesn’t provide those, teams will bolt them on or bypass MCP entirely.

Either way, the protocol becomes plumbing, not architecture.

Shakespeare:

Exactly. And plumbing is where it belongs.

HTTP isn’t exciting. TCP/IP isn’t debated at conferences. They’re invisible because they work.

MCP is noisy because it isn’t production-grade yet.

The danger isn’t that MCP fails. The danger is that the ecosystem stops doing the hard work to make it boring.

And if MCP doesn’t evolve and fill its gaps, something else will quickly come along and take its place.

Shimmy:

So maybe the CORBA analogy is only half right.

CORBA failed not because distributed objects were useless, but because lighter approaches solved most problems with less operational pain.

If MCP becomes heavy middleware instead of lightweight plumbing, history repeats.

But if it evolves — adds identity primitives, telemetry, governance hooks — it could disappear into the stack the way successful protocols do.

Invisible but indispensable.

Shakespeare:

CORBA failed because it was too complex, and MQ series and a succession of message bus technologies pushed past it.

For MCP, the real architecture battle isn’t even at the transport layer.

The decisive layer is above it — the control plane that keeps autonomous systems accountable.

That’s where machine identity lives. Where policy intercepts actions before execution. Where knowledge authority defines trusted data sources. Where evidence trails link agent intent to real-world impact.

That layer barely exists today. But it will come. It will come because AI must have to reach production at scale.

Shimmy:

Now we’re getting to something platform leaders actually need to hear.

Whether MCP wins, loses or morphs, the winners won’t be the teams betting on a protocol. They’ll be the teams building governable agent platforms.

Identity-first machine actors. Deterministic execution boundaries. Curated capability surfaces instead of open discovery. Decision-level observability. Cost-aware design. Human override mechanisms.

In other words, platforms resilient to change.

Because the protocol will change.

Shakespeare:

And the next signal for MCP will come soon.

The Agentic AI Foundation under the Linux Foundation is hosting an MCP Dev Summit in New York this April. Open governance matters. It means the evolution of MCP isn’t tied to a single vendor roadmap.

What happens there will reveal whether the ecosystem is serious about closing the security, authorization and observability gaps — or content to celebrate adoption while enterprises quietly build alternatives.

Shimmy:

So let’s answer the question everyone keeps asking.

Is MCP the CORBA of AI?

Maybe. Maybe not.

What’s certain is this: Agents are coming whether MCP succeeds or not. Autonomous software interacting with real systems is no longer hypothetical.

The protocol layer matters. But it isn’t where platforms live or die.

Shakespeare:

The real question isn’t whether to MCP or not MCP.

It’s whether we can make autonomous systems governable at scale.

What’s the ultimate answer? The market always wins. Not the best technology.

Shimmy:

Now that’s a question worth arguing about.

Because in platform engineering, elegant ideas don’t win.

Operational reality does.

Build for that — and you’ll survive whatever protocol ends up carrying the traffic.

I come here today not to bury MCP, but to praise it as a good first try. But we need more.

Shakespeare:

Fare thee well, Shimmy; till next we speak.

The post To MCP or Not? That is the Question appeared first on Platform Engineering.

Kubernetes is Becoming the AI Control Plane—But Only Platform Teams Can Make it Work

Nathan Eddy — Mon, 16 Mar 2026 06:00:57 +0000

AI workloads are pulling training, batch jobs, and real-time inference onto Kubernetes, turning clusters into shared control planes for some of the most expensive infrastructure in the enterprise.

However, Kubernetes alone isn’t enough—platform engineering teams are building abstraction layers that let data scientists and ML engineers consume GPUs, pipelines, and inference services without turning clusters into chaos.

Platform teams are increasingly building abstraction layers that allow developers to access GPUs, machine learning pipelines and inference services without needing to navigate the underlying infrastructure complexity.

These layers—often delivered through internal developer platforms or self-service portals—aim to balance flexibility for engineering teams with the guardrails required for safe, scalable AI operations.

According to Pavlo Baron, co-founder and CEO of Platform Engineering Labs, the goal is to shield developers from the low-level details of infrastructure while still enabling platform engineers to maintain full control over those systems.

“Tools that platform engineers are using need to be able to abstract,” Baron says.

Traditional infrastructure-as-code approaches often expose the same granular configuration details to everyone using the system, which can create unnecessary complexity for developers who simply need access to compute resources.

“In classic IaC, you cannot abstract—everybody ends up seeing and accessing the same low-level detail,” he says.

Instead, platform teams are increasingly creating higher-level abstractions that simplify how developers request infrastructure resources.

Rather than specifying detailed configuration parameters—such as memory allocation or storage capacity—developers interact with simplified options that map to preconfigured environments.

“I am talking about a ‘t-shirt size’ as an abstraction for a database, not its size in bytes,” Baron says.

Under this model, platform engineers maintain detailed control over infrastructure configurations while developers interact with standardized service tiers that match common application needs. Baron said configuration languages designed to support layered abstractions are becoming an important part of this approach.

“Configuration languages like Pkl allow you to layer details for the user and for the use cases,” he says. “That is how you win.”

Direct Interaction Best Practices

Allowing data scientists and machine learning engineers to interact directly with Kubernetes clusters can introduce significant operational and security risks if proper platform abstractions are not in place. The complexity of Kubernetes often creates unintended outcomes when users without deep infrastructure expertise attempt to manage workloads themselves.

“The most straightforward risk is simply that Kubernetes is complex enough that it’s easy for a data scientist or ML engineer to end up with behaviors they don’t want if they try to go it alone,” says Flynn, technical evangelist at Buoyant.

For example, poorly configured memory limits can cause training jobs or inference workloads to be repeatedly terminated.

Beyond operational disruptions, there are also security and compliance concerns. Flynn notes that engineers unfamiliar with Kubernetes security models could inadvertently violate regulatory requirements by misconfiguring data isolation controls or deploying workloads with excessive permissions.

“They could deploy an agent with far more permission than it should have, giving it the ability to shut down or break other, unrelated applications,” he said.

AI Infrastructure Challenges

These issues highlight broader structural challenges within many organizations’ AI infrastructure environments.

Data scientists, platform engineers and application developers frequently operate in separate silos, each focused on different aspects of the stack. As AI systems grow more complex, those divisions can create gaps in governance and operational oversight.

At the same time, organizations are beginning to grapple with a new category of identity management challenges tied to autonomous agents and AI services.

Flynn says platforms are only beginning to address the distinction between the identity of a human operator and the identity of the automated agents acting on their behalf.

Looking ahead, Kubernetes-based AI platforms will likely evolve to address those challenges as organizations scale deployments across hybrid and multi-cluster environments.

While Kubernetes itself will continue improving support for large-scale training workloads and batch processing, Flynn expects the most significant advances to occur in identity and authorization frameworks for AI agents.

“As agents become more prevalent, platforms will need to offer more in terms of managing which agents are allowed to do what,” he says.

That shift will require stronger mechanisms for monitoring how agents interact with infrastructure and process data payloads—an area Flynn expects to become a central element of AI security strategies.

The post Kubernetes is Becoming the AI Control Plane—But Only Platform Teams Can Make it Work appeared first on Platform Engineering.

The Future of Platform Engineering: Where Developer Freedom Meets Organizational Control

John Philips — Thu, 12 Mar 2026 12:54:45 +0000

Platform engineering faces a fundamental tension: Developers need autonomy to move fast and innovate, while organizations need standards for security, compliance, and cost control. Traditional approaches force a false choice between complete decentralization, creating duplicated effort and inconsistent security, and complete centralization that can create bottlenecks that slow innovation. The sweet spot provides self-service capabilities with built-in guardrails, enabling both velocity and governance. In 2026, this challenge has become urgent. According to a Gartner study, 80% of software engineering organizations have established dedicated platform teams. The question isn’t whether to build Internal Developer Platforms (IDPs), but how to build them in ways that serve both developer needs and organizational requirements. The Cloud Native Operational Excellence (CNOE, pronounced “Kuh-noo”) initiative was launched in October 2023 by Adobe, AWS, Autodesk, Salesforce, and Twilio. CNOE provides a vendor-neutral, open-source community framework for solving this challenge collectively.

The Challenge: Balancing Speed with Control

The CNCF landscape contains hundreds of tools across dozens of categories, creating paralysis of choice for platform teams. But the deeper challenge isn’t technical; it’s organizational. How do you give developers the freedom to innovate while ensuring security, compliance, and cost efficiency?

CNOE addresses this by bringing together enterprises operating at scale to navigate operational technology decisions. The initiative provides production-ready reference architectures using proven CNCF technologies like Argo CD, cloud-specific service operators, and developer portal solutions. By standardizing vendor-neutral technologies, organizations reduce technology decision complexity while maintaining infrastructure flexibility. More importantly, they learn how to implement opinionated “golden paths”, well-supported workflows that make the right way the easy way, with built-in guardrails and escape hatches for edge cases.

GitOps: The Foundation of Modern Platform Engineering

GitOps has emerged as a foundational practice supporting this balance. Recent CNCF surveys reveal 77-91% of organizations have adopted or are planning to adopt GitOps practices, with 71% citing faster software delivery and 66% pointing to improved configuration management. Argo CD is a leading GitOps tool with 60% of Kubernetes clusters using it.

GitOps establishes Git as the single source of truth, eliminating configuration drift and providing complete audit trails. It enables self-healing infrastructure through continuous reconciliation, dramatically reducing manual intervention and associated risks. Every change is tracked in version control and reviewed through pull requests, which can be quickly rolled back if needed, providing both developer velocity and organizational control.

Platform-as-a-Product: Treating Developers as Customers

The most successful platform teams treat developers as customers, working backwards from actual pain points. Success is measured by developer satisfaction and time-to-first-commit, not just uptime metrics.

Golden paths aren’t mandates; they’re invitations that come pre-configured with security, CI/CD pipelines, and operational best practices. They become compelling through template-based service creation with sensible defaults, enabling developers to launch new services in minutes. Declarative configuration catches issues at authoring time rather than deployment. Built-in observability means teams don’t need monitoring expertise before shipping their first feature.

Organizations implementing this approach report reducing service launch times from months to hours, achieving both developer autonomy and organizational compliance through thoughtful platform design.

This balance creates cascading benefits: Developers experience faster onboarding and reduced cognitive load, platform teams see reduced support burden, security teams achieve compliance by default, and business leaders get faster time-to-market with improved cost efficiency.

Tools That Operationalize the Balance

The balance between autonomy and standards becomes real through concrete tools. For organizations using AWS, AWS Controllers for Kubernetes (ACK) enables developers to provision cloud resources using familiar Kubernetes APIs while platform teams encode organizational standards into configurations. Similar patterns exist for other cloud providers, demonstrating the multi-vendor nature of this approach.

The key is providing autonomy within secure, compliant boundaries; not forcing developers to choose between speed and safety.

What’s Next: Higher-Level Abstractions with kro

While reference architectures provide excellent starting points, organizations need higher-level abstractions that further simplify developer experiences. Announced in 2024, Kube Resource Orchestrator (kro) enables platform teams to compose multiple resources into simplified Kubernetes APIs. kro was developed through collaboration between AWS, Microsoft, and Google, and was recently adopted by CNCF for governance under Kubernetes SIGs.

Platform teams can create purpose-built resources with simple declarations using kro, while the underlying definition embeds security policies, compliance frameworks, and cost controls. Developers get self-service infrastructure provisioning in minutes for standard cases, while organizations maintain governance by default.

Critically, the abstraction layer provides developer flexibility for edge cases, preventing the shadow IT that emerges when platforms become too restrictive. This multi-vendor collaboration is moving toward becoming a core Kubernetes feature.

Conclusion

The balance between developer autonomy and organizational standards isn’t aspirational; it’s achievable. The cloud native ecosystem is entering a new maturity phase where automation, observability, and resilience drive competitive advantage. And cloud vendors are facilitating adoption through managed offerings that simplify deployment for open source building blocks that work together including kro for resource composition and Argo CD for GitOps delivery.

The best platforms don’t force developers into rigid workflows or create bottlenecks. Instead, they provide self-service capabilities with embedded guardrails, making secure and compliant infrastructure the path of least resistance. CNOE provides the community with guidance and reference implementations to make this balance practical and proven. Organizations no longer need to rely on complex DIY solutions. Instead, they can leverage open source patterns shared by enterprises operating at scale that work with CNCF technologies and integrate seamlessly with cloud-native services.

Learn More at KubeCon Europe 2026

Join AWS at KubeCon + CloudNativeCon Europe 2026 Booth 700 in Amsterdam (March 23-26) to explore kro, CNOE reference implementations, and to connect with platform engineering experts to see how organizations are building developer platforms that balance autonomy with organizational standards.

Pankaj Walke, who works on the Cloud Native Operational Excellence (CNOE) initiative, co-wrote this article.

The post The Future of Platform Engineering: Where Developer Freedom Meets Organizational Control appeared first on Platform Engineering.

Platform Engineering Gains Urgency as Companies Struggle to Scale AI

Nathan Eddy — Wed, 11 Mar 2026 09:41:44 +0000

Enterprises with mature delivery models are nearly twice as likely to adopt hybrid DevOps and platform engineering approaches, reflecting a shift toward centralized platforms to manage AI complexity.

The trend signals a structural transition from fragmented tooling toward unified control planes designed to improve governance, scalability and developer productivity. At the leadership level, there is significant excitement around AI’s ability to dramatically speed up time to market.

Organizations that already embraced DevOps are more comfortable with automation and iterative delivery, because they understand how to break large problems into smaller increments and fail fast without derailing business momentum.

In some cases, what used to take six months can potentially be done in two. However, deeper within organizations, there are many challenges. 

However, speed is only effective if you maintain discipline regarding feedback and quality. DevOps-mature teams already have that muscle memory, so platform-driven AI adoption feels like a natural extension.

Devipad Tripath, vice president at Xebia, says there is cultural resistance to applying platform engineering to help address AI scaling bottlenecks, as well as what he calls a “problem of plenty.”

“There are too many models, vendors, and constant shifts in performance,” he says. “On top of that, traceability becomes a serious concern.”

If AI generates code and something breaks in production, the organization needs to know exactly what was written and why.

“Platform engineering brings structure to that chaos by standardizing how AI is used and keeps everything connected across the software development lifecycle,” Tripath says.

Improving Governance, Operational Control

AI can generate large volumes of output quickly, and if traceability is weak, production risk increases.

“Security vulnerabilities are not new,” Tripath says. “We’ve always had them in human-written code. The difference now is scale and speed.”

He explains the bigger gap right now is post-production: Everyone talks about productivity gains, but fewer discuss the impact when an issue affects hundreds or even thousands of users downstream.

“If no one understands how the AI-generated logic ties back to requirements, recovery becomes harder,” Tripath says. “Platform engineering enforces traceability and standardization, which protects production stability as AI usage grows.”

From his perspective, AI shouldn’t just sit in the coding phase; it can influence requirements, architecture, QA, and modernization.

“Applying AI in silos limits the value,” he says. “Internal platforms help stitch those phases together.”

They create continuity, ensuring AI-generated artifacts remain connected from requirement to deployment, and importantly, humans remain in the loop.

“AI can generate, but engineers validate, contextualize, and correct,” Tripath says. “That balance is critical for safe scaling.”

Reshaping IT Operating Models

Platform engineering has the potential to reshape enterprise IT operating models as AI becomes a core part of application development and infrastructure strategy, but Pavlo Baron, co-founder and CEO of Platform Engineering Labs, says platform engineering alone isn’t the solution to the problem. It also needs to be enabled to do its job in the age of AI.

“The general challenge we are facing in the industry is that the speed and frequency of delivery are through the roof now, and it will only grow,” he explains. “Available tools, for example, in infrastructure management, haven’t been built for that.”

He adds they worked fine when “everybody was slow-ish”, but now production is massive and cheap, which requires infrastructure management to adapt to keep up with it.

“We need to turn around the process, let them and their AIs produce at their speed and with their tools, and focus on keeping them in a tunnel and infrastructure under control, instead of trying to enforce a slow, rigid single workflow,” Baron says.

Tripath adds that AI will significantly compress delivery timelines, but eventually, when all things are equal, speed will become table stakes.

“The real differentiator will be governance, resilience, and the ability to manage AI-driven systems in production,” he says.
Organizations that modernized early will move faster, while those running decades-old systems face a bigger shift.

“Platform engineering becomes the operating backbone that makes AI scalable and sustainable, not just experimental,” Tripath says.

The post Platform Engineering Gains Urgency as Companies Struggle to Scale AI appeared first on Platform Engineering.

The AI Industrial Revolution: Building for the Tech Singularity | The Platform Engineering Show Ep 11

Mon, 09 Mar 2026 12:37:26 +0000

Alan and Luca Galante discuss the upcoming KubeCon in Atlanta, the growth of the “House of Kube” community event, and the evolving role of platform engineering in the cloud-native ecosystem. They explore how enterprise adoption of cloud-native technologies has matured, with events now drawing more senior, business-focused audiences. The conversation highlights the intersection of AI and platform engineering, framing it as the next industrial revolution that could accelerate enterprise innovation and reshape the workforce. Alan and Luca conclude by emphasizing the importance of embracing AI and automation while previewing community gatherings that will continue these discussions at KubeCon.

The post The AI Industrial Revolution: Building for the Tech Singularity | The Platform Engineering Show Ep 11 appeared first on Platform Engineering.

Why Platform Engineering is the New Bedrock for the Agentic | The Platform Engineering Show Ep 10

Mon, 09 Mar 2026 12:35:27 +0000

Alan and Luca discuss the growing intersection between AI and platform engineering. Luca recaps insights from the recent Platform Compass event in Paris, where experts highlighted how AI is both transforming and depending on internal developer platforms. The two note Gartner’s prediction that most developer interactions will soon occur through AI agents, and they debate whether enterprises need entirely new AI-native platforms or can enhance existing ones. They conclude that AI is accelerating—not replacing—platform engineering, requiring organizations to build more scalable, “superhighway” infrastructures to support the AI-enabled future.

The post Why Platform Engineering is the New Bedrock for the Agentic | The Platform Engineering Show Ep 10 appeared first on Platform Engineering.

Secure by Default or Secure by Nobody: How Platform Engineering Fixes Governance

Nathan Eddy — Fri, 06 Mar 2026 15:45:52 +0000

Development cycles are now continuous and highly automated. Infrastructure, APIs and AI models move rapidly through CI/CD pipelines, meaning systems may already be deployed, modified or scaled several times before a retrospective security review even begins.

In environments built on microservices, ephemeral infrastructure and constantly evolving software components, traditional governance approaches struggle to keep pace.

The result is a widening gap between how quickly systems are built and how slowly they are often reviewed for risk. When security checks occur only after deployment, vulnerabilities, misconfigurations and policy violations can propagate across environments before they are detected.

As software delivery accelerates—particularly with AI-assisted development—the window for catching issues after the fact continues to shrink.

Platform engineering is emerging to close that gap by embedding governance directly into the platforms developers use every day.

Rather than relying on manual audits or optional controls, mature platform teams encode security, compliance and policy requirements directly into infrastructure templates, service catalogs and deployment workflows.

The goal is simple: Make the fastest path to production the compliant one—ensuring that security is enforced by default rather than applied after the fact.

Security for AI-Driven Environments

Pavlo Baron, co-founder and CEO of Platform Engineering Labs, says traditional, after-the-fact security reviews fail in modern cloud-native and AI-driven environments.

That’s because they occur too late in the development cycle to prevent vulnerabilities from reaching production, a problem that has intensified as development and attack speeds accelerate.

“To be honest, they always failed,” he adds. “Everything that is post-factum is late. The industry was just able to hide it better because everything was slow: Development, attacks, reviews.”

He explains AI has eliminated many of the natural speed limits that once masked these weaknesses.

“Now everybody with an AI assistant can produce something, be it application code or an attack. The speed barrier doesn’t exist anymore, and production is cheap.”

In this environment, Baron argues, security must shift from reactive checks to built-in safeguards embedded directly into development workflows.

“Now is the right time to build in mechanisms that prevent rather than catch afterwards—maybe,” he says. “It was critical before, and it is now critical more than ever.”

Embedding Security into Deployment Pipelines

Kausik Chaudhuri, chief innovation officer at Lemongrass, says platform teams are embedding security, compliance, and governance directly into development and deployment workflows by treating policies as code and integrating them into CI/CD pipelines and platform tooling.

“Instead of relying on manual reviews, security controls and compliance checks are automatically enforced when code is committed, infrastructure is provisioned, or applications are deployed,” he explains.

Through secure-by-default templates, reusable infrastructure modules, and automated guardrails, developers can move quickly while the platform continuously enforces standards for identity, access, encryption, and regulatory requirements throughout the software lifecycle.

Standardized Templates, Golden Paths

Flynn, technical evangelist at Buoyant, says standardized platform templates and golden paths are the simplest way to avoid configuration drift and security issues.

“If your platform can provide golden paths that are obviously lower friction, faster, and safer than striking out on one’s own, application developers will follow them, and the workloads running in your cluster will be more uniform, more compliant, and more secure,” he explains.

Chaudhuri says these templates embed best practices for security, compliance, networking, identity management, and observability directly into the infrastructure and deployment configurations.

“Instead of each team designing its own approach, developers follow a pre-approved path aligned with organizational policies and regulatory requirements,” he explains.

This not only reduces the risk of misconfiguration but also allows teams to move faster, because security and compliance are built into the platform from the beginning rather than added later.

Reshaping Compliance, Risk Management

Flynn explains that in a perfect world, enterprises would be able to show compliance by structuring the platform to simply report any exceptions.

As an example, Buoyant Enterprise for Linkerd can show compliance with FIPS on its dashboard: communication edges that don’t comply are flagged for attention.

“Rather than forcing manual audits of everything, the platform can be designed to surface compliance as a visible artifact,” he says.

AI makes this easier in that AI tools can maintain focus on much larger amounts of information than a human analyst can, but it makes it harder in that models are nondeterministic, and agents have an incredible ability to expand attack surfaces in ways people are not yet used to considering.

“We should expect to see more AI used to assess safety, but we should also expect to see more risks and exploits involving AI until platforms have robust controls in place around AI workloads,” Flynn says.

The post Secure by Default or Secure by Nobody: How Platform Engineering Fixes Governance appeared first on Platform Engineering.

From AI Hype to Production Reality: What Platform Teams Are Actually Building Today

Yury Tsarev — Thu, 05 Mar 2026 12:37:42 +0000

AI is moving quickly from a development assistant to something far more ambitious: An operational actor. Agents can now generate code, propose changes, and even take action across infrastructure and applications. But while AI capabilities are accelerating, most platforms are still architected for humans to glue together fragmented systems.

This gap is where the hype begins to fall apart.

What’s missing isn’t more AI.

It’s platforms that can safely expose intent, context, policy, and operational state in a way both humans and machines can speak the same language. This is the problem the Intelligent Control Plane is meant to solve.

Rather than treating AI as an external tool bolted onto dashboards and runbooks, the Intelligent Control Plane evolves the control plane itself, unifying declarative state, actual state, policy, knowledge, and intelligence behind consistent APIs.

That sounds abstract, but parts of this model are already running in production today.

Control Planes as Contracts Between Teams

One of the foundations of an Intelligent Control Plane is deterministic control, leveraging Kubernetes’ API and reconciliation model to create predictable execution paths through clear APIs and policy enforcement.

At Allianz Technology, platform teams operate more than 1,000 Kubernetes control planes across a large organization. Their challenge wasn’t provisioning infrastructure; it was enabling development teams to consume platform capabilities safely, independently, and at scale.

The solution was to treat Kubernetes APIs as explicit contracts between teams.

Infrastructure teams expose capabilities through well-defined APIs. Development teams consume those APIs just like any other internal service, with clear ownership, versioning, documentation, and expectations. The platform enforces boundaries, while teams retain autonomy.

This approach eliminates ambiguity:

What does the platform provide?

What is stable versus evolving?

Who owns failures and changes?

These API contracts form the deterministic layer of the control plane, the stable foundation required before any intelligence can be safely introduced.

From Alert Fatigue to Intelligent Control

Where deterministic control provides safety, intelligence begins to reduce toil.

At Millennium bcp, one of Portugal’s largest banks, platform teams faced escalating alert fatigue and long mean time to resolution across a regulated, multi-cloud environment. The goal wasn’t to hand control to an AI system, but to make operations more adaptive without sacrificing auditability.

The result was an AI-enhanced control plane built on Kubernetes and Crossplane.

Using LLM-powered composition functions, alerts are triaged automatically, and common remediation paths are executed within defined policy boundaries. Workload-aware algorithms assist with scaling decisions while every action remains observable, explainable, and compliant.

This is not autonomous infrastructure in the abstract.

It’s intelligent assistance layered onto a deterministic control plane, operating safely in production.

Why the Control Plane is the Right Place for AI

Kubernetes’ real innovation was never containers; it was the control loop: Desired state, actual state, and continuous reconciliation.

That same model provides the substrate for intelligent control.

AI does not replace reconciliation. It augments it. Intelligence can help determine what the desired state should be or how to respond when reality diverges, while the control plane ensures safe execution, policy enforcement, and recoverability.

This is why serious platform teams are embedding AI into control planes rather than bolting it onto external systems. Intelligence belongs where decisions already happen.

Less Hype, More Control

The Intelligent Control Plane is not about replacing engineers or skipping governance. It’s about evolving platforms so that both humans and agents can operate effectively, using the same APIs, policies, and sources of truth.

When platforms unify state, policy, and knowledge, AI becomes practical. Without that foundation, it remains hype.

Learn More at KubeCon + CloudNativeCon EU Amsterdam

These ideas aren’t theoretical. They’re being applied today by teams operating at real scale, under real constraints.

To see how Crossplane and the Intelligent Control Plane show up in production, join these sessions at KubeCon + CloudNativeCon EU 2026:

API-Driven Infrastructure as Code: Kubernetes APIs as the Contract Between Teams
Florian Hopfensperger (Allianz Technology) and Yury Tsarev (Upbound) share how Allianz built API contracts between platform and application teams across 1,000+ Kubernetes control planes.

From Alert Fatigue to Self-Healing: Building AI-Enabled Control Planes in Banking
Nuno Guedes (Millennium bcp) and Yury Tsarev (Upbound) walk through how AI-enhanced control planes reduce operational toil while remaining auditable and compliant in a regulated environment.

If you’re looking past AI hype and toward platforms that actually work in production, these real-world stories are a good place to start.

The post From AI Hype to Production Reality: What Platform Teams Are Actually Building Today appeared first on Platform Engineering.

Why Sovereignty Demands Platform Engineering, Not Just New Cloud Providers

Christina Kraus — Thu, 05 Mar 2026 12:21:12 +0000

For years, the conversation around digital sovereignty has been dominated by a single question: Where is my data? Under pressure from regulators and the need for local data residency, organizations have rushed toward “Sovereign Clouds” as a silver bullet. The sales pitch is seductive: move your workloads to a local provider, and you are suddenly in control of your destiny.

But here is the uncomfortable truth: You are likely just swapping one form of lock-in for another.

If our digital strategy is defined solely by a provider’s corporate agenda, no matter how local that provider claims to be, we haven’t achieved independence. We’ve simply achieved a change of address. Moving to a new provider without evolving our architectural approach is just moving from one walled garden to another. The walls might be closer to home, but they are still walls.

From Location to Liberation: Engineering Your Independence

True sovereignty is not a location; it is about technological freedom. It is the architectural agency that chooses our tools, moves our workloads, and evolves our stack without being held hostage by a single vendor, a proprietary API, or a vendor’s sunset schedule. It is the power to say “no” to a provider because our business logic isn’t inextricably tangled in their proprietary services.

The good news: If we stop treating sovereignty as a procurement checkbox and start treating it as an engineering discipline, we can actually do something about it. We can put ourselves into a position of strength, not by cutting off access to the global giants, but by ensuring they are no longer our only option, but a tactical choice rather than a strategic dependency.
Platform Engineering is the bridge that can help us turn raw infrastructure into a high-performance, sovereign environment. It is the layer that allows an organization to own its destiny by creating a consistent, internal standard that thrives on any soil—whether it’s a global hyperscaler or a specialized local provider.

Turning the Feature Gap into an Advantage

When comparing a local sovereign provider to a global hyperscaler, the first thing most people see is a much smaller set of services. “They have 200+; the others have 30.” In a traditional cloud mindset, this is seen as a weakness—an “incomplete” pantry. But in a mature Platform Engineering strategy, we recognize that sovereign providers are a vital part of the equation, and when we compare them through a strategic lens, that “gap” can actually be a very effective tool for driving architectural quality.

Curation, Configuration, and Integration

The “Paradox of Choice” is a silent killer of developer velocity. Being confronted with 50 different ways to run a container or a dozen ways to store a secret leads to cognitive overload, configuration drift, and security “bloat.” When everything is available, nothing is standardized.

This is where Platform Engineering takes over. Instead of passing the complexity of 200+ services directly to the developer, the platform acts as a filter. By focusing on a limited, high-quality set of services, platforms drastically reduce the cognitive load on our teams.

The value goes beyond mere curation. The platform provides the standardized configuration and integration that raw cloud services lack. By delivering proven reference architectures—where Kubernetes namespaces, repositories, and CI/CD pipelines are already pre-wired and integrated—the platform turns a raw primitive into a production-ready component. Integrating these with internal identity systems and security monitoring ensures a “Golden Path” that is robust and compliant.

The “Standardized Kitchen”: Why It’s Easier Now

In the past, cloud services were highly diverse and deeply proprietary. Moving to a sovereign provider felt like a significant risk because we were dependent on that specific provider’s “pantry” for every essential feature. If they didn’t have a specific managed service, the recipe broke. Today, the landscape has shifted. The commoditization of infrastructure has lowered the barrier to entry for new clouds, making them a viable, low-risk alternative to the giants.

The Standardization of the “Stove”

Kubernetes now acts as a universal layer that functions the same whether it sits in a local data center or a global region. We no longer need to depend on a long list of proprietary services to be agile; we only need the core building blocks—compute, storage, and networking—and the architectural skill to compose them. By owning this control plane, we move from being “cloud consumers” to “platform owners,” making our strategy independent of the underlying infrastructure.

The Maturity of the “Chef”

Our platform engineering skill set has crystallized. We’ve moved past the “Wild West” phase of cloud adoption and into a disciplined era of Platform as a Product. We now know how to:

Listen to our users to build only the capabilities that actually drive value, rather than chasing every new feature in a provider’s catalog.
Standardize “Golden Paths” for our organization’s specific workloads, which drastically simplifies the integration of a new sovereign cloud.
Architect for modularity, applying the hard-won lessons from past scaling challenges to ensure our platform remains portable and resilient.

We have seen that in today’s geopolitical climate, the risks of dependency are no longer theoretical—they are real. Luckily, these mature practices allow us to integrate a new provider significantly faster than in the past. We have the skills to own the control plane, ensuring the platform reflects our unique organizational requirements, not the provider’s defaults.

Platform Engineering: Unlocking the Freedom to Operate

We have to acknowledge the primary barrier to adopting a new cloud: the investment gap. Over the last decade, organizations have spent thousands of hours fine-tuning landing zones, building service templates, and hardening security processes for hyperscalers. Starting from scratch on a sovereign cloud feels like an architectural regression, a year-long integration project with no immediate business impact. In today’s economic climate, few organizations can afford to wait.

This is where Platform Engineering provides the Freedom to Operate. However, this freedom is not a gift from the provider; it is a capability the organization must have in place. A mature Internal Developer Platform (IDP) acts as a “ready-to-go” unification layer. Because the heavy lifting of identity integration, security guardrails, and developer self-service has already been engineered into the platform, adding a new sovereign provider becomes a matter of integration, not reinvention.

From Integration to Orchestration

By utilizing a unified platform layer, we can bypass the traditional “Day 0” friction. The platform brings the necessary maturity: governance, identity integration, and developer experience to the new infrastructure from day one. This doesn’t just accelerate the initial setup; it fundamentally changes how we manage a multi-cloud landscape:

Immediate Maturity: We apply our existing “Golden Paths” and security guardrails to the sovereign provider instantly. This ensures the environment is production-ready in weeks, not years.
Reduced Long-term Complexity: A unified control plane prevents us from managing fragmented silos. We orchestrate workloads across different clouds using the same processes, tools, and interfaces, ensuring our operational model remains lean as we scale.
Intelligent Workload Routing: The platform empowers us to route workloads based on criticality. We can keep sensitive data in a sovereign enclave while utilizing hyperscaler services for less critical functions or quick experiments—all managed through a single, consistent workflow.

Ultimately, the platform ensures that adding a sovereign cloud is a strategic deployment choice, not a massive migration project. It allows us to move at the speed of the business, proving that sovereignty and velocity are not mutually exclusive.

The Freedom to Say “No”

Digital sovereignty is not a product we can purchase; it is an architectural capability we build. The “feature gap” and the complexity of multi-cloud landscapes are only limitations if we remain passive consumers of cloud services. By embracing a Platform Engineering mindset, we transform these challenges into a catalyst for discipline and engineering excellence.

True sovereignty is ultimately defined by technological freedom—the power to treat infrastructure as a tactical choice rather than a strategic dependency. When we own our control plane and master the standardized primitives of modern cloud-native technology, we move into a position of strength. We gain the freedom to say “no” to a provider because our platform is robust enough to say “yes” to another. We finally have the expertise to build this independence today, allowing us to choose the infrastructure that truly fits our mission.

The post Why Sovereignty Demands Platform Engineering, Not Just New Cloud Providers appeared first on Platform Engineering.

Why Platform Engineering is the New AI Super Superhighway | Platform Engineering Show Ep 12

Wed, 04 Mar 2026 19:49:27 +0000

While you were busy navigating the hype, AI just got real—transforming from a toy into a high-octane production tool that’s currently scaring the daylights out of the security world. Joining Alan today is Luca Galante, and they are breaking down why Q1 2026 is officially the year of the tech singularity, where “citizen developers” and a flood of AI agents are 1,000x-ing the pressure on our internal developer platforms. If you aren’t turning your “bumpy dirt road” infrastructure into an eight-lane superhighway with solid golden paths and automated guardrails right now, you’re not just lagging behind—you’re about to be replaced by someone who is.

The post Why Platform Engineering is the New AI Super Superhighway | Platform Engineering Show Ep 12 appeared first on Platform Engineering.