Planit

Understanding how AI systems fail: A layered failure taxonomy

Francis Taylor — Fri, 13 Mar 2026 02:19:53 +0000

Articles

AIArtificialIntelligenceQualityEngineeringTesting

Understanding how AI systems fail: A layered failure taxonomy

Engineering Quality for AI systems series — Part 2

13 March 2026

In the previous article, we argued that AI quality engineering is not about verifying correctness but about evaluating behaviour under uncertainty.

That’s the conceptual shift.

Now let’s make it practical. If AI systems don’t fail deterministically, how do they fail?

They don’t fail randomly. They fail in patterns.

And unless we classify those patterns structurally, testing remains reactive. We fix a hallucination here, adjust retrieval there, add a guardrail somewhere else, but we don’t truly understand the system.

So, let’s introduce some structure.

AI systems are layered

Modern enterprise AI systems are not just model calls. They are layered systems composed of:

Probabilistic model behaviour
Retrieval and contextual grounding
Orchestration and tool interaction
Human interpretation and decision-making

Failures originate at different layers. And here’s the crucial observation: as failures move upward through these layers, their visibility decreases while their business impact increases.

Let’s make that concrete. Failures don’t stay contained at one layer. They propagate.

A grounding gap can amplify model variability. A model inconsistency can propagate through orchestration. A workflow error can surface as misplaced human trust.

The taxonomy isn’t just layered; it’s directional.

Notice something important here: The most visible failures often originate at the lowest layer. The most consequential failures emerge at the highest layer.

Let’s walk through each layer.

Why four layers?

This taxonomy organises failures by where they originate in the system’s execution flow:

Generation: what the model produces
Information: what the model consumes
Interaction: how components coordinate
Interpretation: how humans act on outputs

Other frameworks classify AI risk by regulatory or societal categories. This taxonomy is architectural. It maps failures to where they originate in system execution, making it directly applicable to testing strategy and evaluation design.

Layer 1 – Model behaviour failures

This is where most AI discussions begin. You’ve seen it:

Hallucinated facts
Instruction drift in long prompts
Inconsistent answers to identical inputs
Overconfident but incorrect outputs

These are inherent characteristics of probabilistic systems.

Consider a code-generation assistant producing a payment-processing function. The code compiles. It passes unit tests. But it logs full credit card numbers in application logs, violating PCI DSS requirements.

It’s technically valid but operationally risky. The model didn’t hallucinate. It failed to respect a domain constraint.

Layer 2 – Retrieval & context failures

In Retrieval-Augmented Generation systems, reliability depends heavily on the retrieval pipeline.

A Retrieval failures often present as:

Grounding gaps – relevant documents not retrieved
Rank inversion – incorrect documents ranked above correct ones
Context truncation – critical information cut off due to token limits
Stale retrieval – archived policies surfaced as current guidance

These failures are often influenced by engineering choices such as chunk size, embedding refresh cycles, and recall versus precision trade-offs.

A compliance assistant is asked: What’s our data retention policy for EU customers? It confidently cites the 2019 policy (180-day retention) that was superseded in 2022 under GDPR updates (90-day retention for certain categories). The retrieval system ranked the archived document higher because it had more keyword matches and was longer (signal of comprehensiveness).

The model didn’t hallucinate. The retrieval layer surfaced the wrong source.

Layer 3 – Orchestration failures

Agentic systems introduce another dimension. Here, the model selects tools, invokes APIs and executes multi-step workflows.

Failure patterns include:

Tool misuse or incorrect parameter selection
Repeated or looping tool calls
Cascading errors across workflow steps
Partial completion without appropriate fallback

Imagine a system that retrieves customer data, drafts a response, then unnecessarily re-queries the same data due to state misinterpretation. No obvious error is thrown. Costs increase. Latency increases. Risk increases.

Each component works. But the interaction still fails.

This layer is where deterministic assumptions fail most visibly. Traditional output validation cannot reveal orchestration instability.

Layer 4 – Human & trust failures

This is the most consequential layer and the least engineered.

Even when technical metrics are within thresholds, failure can occur in how humans interpret and act on outputs.

Consider a fraud detection system with 98% accuracy. After months of consistent performance, analysts begin approving flagged transactions with minimal review.

When a new fraud pattern emerges, one the system was never trained on, it passes unnoticed. The model performed within specification but human scrutiny degraded.

That is a Layer 4 failure emerging from Layer 1 consistency.

Mini case study

When AI-generated tests cleared the pipeline but missed the checkout bug

We saw this during a pilot where an LLM-based assistant was used to generate regression tests for an e-commerce checkout service.

The workflow looks like this:

Developer submits PR with changes to discount calculation logic.
CI triggers test generation using an LLM.
Generated tests are added to a review queue.
High-confidence tests are merged automatically.

At first, it worked well. Coverage improved. Edge cases increased. Regression escapes decreased.

Six weeks later, a production defect appeared: orders combining a gift card with a percentage-off coupon were calculating the wrong final charge. The defect should have been caught.

Using the layered taxonomy, the diagnosis becomes clearer.

Layer 1 – Model Behaviour: The prompt asked for “discount logic edge cases.” The model generated structurally valid tests but consistently used assertions that checked status codes rather than calculated values for combined discount scenarios. The gift card plus coupon path was tested. Whether the amount was correct was never asserted. Not incorrect. Just systematically shallow on what mattered.

Failure pattern: Output inconsistency + assertion gap on compound state logic.

Layer 2 – Retrieval and context: Historical bug reports were included as retrieval context. However, retrieval ranking prioritised recent high-frequency issues. A year-old bug about combined discount miscalculations was ranked low and truncated before reaching the context window. The model never saw the relevant historical signal.

Failure pattern: Recency bias + context truncation of low-frequency high-severity bugs.

Layer 3 – Orchestration: The CI workflow auto-approved tests when overall coverage crossed 80%. The new tests pushed coverage from 78% to 83%, but all the new coverage landed on already well-tested single-discount paths. The threshold was satisfied. Tests were merged. Nothing checked which paths the additional coverage actually represented.

Failure pattern: Threshold passed without coverage quality check.

Layer 4 – Human and trust: Over time, the QA engineer stopped reading individual test cases and checked only the coverage percentage on the dashboard. “Numbers have looked fine for weeks.” Manual validation depth declined.

Failure pattern: Metric substitution + confidence drift at the human layer.

What this shows

The escaped production defect wasn’t just a hallucination or a retrieval issue. And it wasn’t pipeline misconfiguration or human oversight either.

It was a multi-layer propagation failure.

Layer 2 hid the historical signal from Layer 1. Layer 1 produced shallow assertions without that context. Layer 3 accepted the tests because a number looked right. Layer 4 never looked closely enough to catch it manually.

From the outside, it looked like: “The AI missed a discount edge case.” But structurally, it was a layered evaluation gap.

Failure propagation across layers

Failures rarely remain confined to a single layer.

A retrieval precision issue (Layer 2) can produce a grounded but misleading response. That response passes through orchestration (Layer 3). A human accepts it without challenge (Layer 4).

What appears to be “a wrong answer” is often a multi-layer interaction.

Without structure, incidents appear isolated and unpredictable. With a layered taxonomy, recurring patterns become diagnosable.

We can also view this structurally through a different lens, not just propagation, but risk concentration.

As failures move upward: observability decreases and business impact increases — a dynamic that can be visualised as a failure landscape.

The positioning is illustrative. Actual visibility and impact will vary by architecture and organisational maturity. The principle, however, holds: The higher the layer, the harder the failure is to detect and the more consequential it tends to become.

In practice, disciplined diagnosis follows the architecture.

When an output is wrong, the first instinct is to blame the model. But a disciplined approach asks: what was retrieved? How was it ranked? How did the workflow execute? And how did the user interpret the result?

Taxonomy becomes a thinking tool, not just a diagram.

Why this taxonomy actually matters

This isn’t about having a neat diagram. It’s about changing how we respond when something goes wrong.

Without structure, teams react to symptoms:

“The model hallucinated.”
“Let’s tweak the prompt.”
“Let’s add a guardrail.”

But if the failure originated in retrieval ranking or orchestration logic, we’re solving the wrong problem.

The taxonomy forces a different question:

Where did this behaviour originate?

Once you answer that, your evaluation strategy changes. You don’t just measure outputs. You instrument retrieval. You trace orchestration. You pay attention to human escalation behaviour.

Quality in AI systems is systemic. If we only test model responses while ignoring retrieval quality, workflow logic, or trust behaviour, we are testing fragments, not the system.

Layered systems require layered evaluation. Without that shift, AI quality stays reactive.

A candid observation

In our experience working with enterprise AI deployments, most initiatives concentrate heavily on Layer 1.

Some are beginning to instrument Layer 2. Very few systematically test Layer 3. Almost none design controls explicitly around Layer 4.

That’s not criticism. It reflects maturity progression. But if we intend to move AI from experimentation to enterprise-grade capability, structural clarity becomes essential.

What comes next

Understanding how AI systems fail is the first step.

The next question is unavoidable: If failures originate at different architectural layers, how do we evaluate each layer rigorously?

What should we measure? How do we monitor drift? When do we rely on model-graded evaluation? And how do we detect orchestration instability?

In the next article, we move from classification to measurement.

Let’s design evaluation layer by layer. Because once failure is structured, evaluation can become systematic rather than reactive. And that is where AI quality engineering shifts from patching symptoms to engineering reliability.

AUTHOR:

Reach New Heights

AI is transformational, yet only 33% of leaders are confident their enterprise will mitigate the risks of AI. How ready are you?

No matter where you are on your AI journey Planit has the expertise and solutions to accelerate you towards Quality AI.

Find out More

Get Updates

Get the latest articles, reports, and job alerts.

Beyond deterministic testing: Why testing AI systems is fundamentally different

Francis Taylor — Wed, 25 Feb 2026 04:35:46 +0000

Articles

AIArtificialIntelligenceQualityEngineeringTesting

Beyond deterministic testing: Why testing AI systems is fundamentally different

Engineering Quality for AI systems series — Part 1

25 February 2026

Artificial intelligence is moving quickly into real-world systems, but the way we test and assure it hasn’t caught up. Traditional testing assumes predictable behaviour and predefined expected results, but these are assumptions that don’t hold for probabilistic, non-deterministic AI and LLM-driven solutions.

Below is the first article of a series that explores how quality engineering must evolve, shifting from verifying functionality to evaluating behaviour, risk and trust. Across the articles, we outline practical approaches to defining quality, evaluating models, embedding responsible AI controls, and establishing an operating model organisations can use to deliver AI with confidence.

The Structural Assurance Gap

AI is moving into production faster than most organisations have figured out how to test it properly.

We’re deploying customer support assistants, engineering copilots, compliance advisors, and internal productivity tools. The capabilities are impressive. The assurance frameworks behind them are still catching up.

We’re used to systems behaving the same way every time. If the code hasn’t changed, the output shouldn’t be changed either.

AI doesn’t work like that.

Yet most enterprise AI systems are still being tested as if they do.

We’ve changed the architecture. We haven’t fully changed the assurance model.

Large Language Models generate outputs from probability distributions rather than fixed logic. The same prompt can produce different responses. Behaviour depends on context windows, configuration choices, retrieval results, and model versions. This variability manifests in measurable ways: inconsistent outputs across identical prompts, hallucination rates that vary by domain, embedding drift affecting retrieval precision, and orchestration failures that cascade through multi-step workflows.

When we test these systems using frameworks designed for deterministic software, blind spots appear. Not because the systems are inherently flawed, but because our testing assumptions no longer match how they behave.

The industry has responded with specialised evaluation approaches, model-graded assessments, retrieval quality metrics, and production observability tooling. But tools alone do not solve a structural assurance gap. Without clarity on what we are evaluating and why, measurement becomes reactive rather than systematic.

That’s the structural gap we need to address.

The Testing Model We Know and its Limits

Traditional testing works when behaviour is controlled and repeatable.

Define the requirements.

Define the expected output.

Run the test.

Compare actual versus expected.

Pass or fail.

That model has served us well for decades.

It still applies.

Let’s be clear: traditional quality engineering isn’t going anywhere.

APIs still need validation.

Business rules still require verification.

Integration flows must be tested.

Performance and security remain critical.

AI-enabled systems still rely heavily on deterministic scaffolding-policy enforcement layers, orchestration logic, and fallback mechanisms. These components behave predictably and should be tested as such.

But language-model-driven behaviour introduces something different. Quality engineering now needs to evaluate behaviour under uncertainty, not just verify logic under control.

Where “Correct” Is No Longer Binary

CUSTOMER SUPPORT ASSISTANT

Let’s make this concrete. Two scenarios many of us are already dealing with.

A customer submits a query:

“Why was my refund rejected?”

In a deterministic system, response logic maps directly to defined rules. The same input returns the same explanation.

In an AI-enabled system, the response is generated probabilistically. Across multiple executions, the assistant may:

Provide an accurate explanation grounded in policy.
Offer a partially correct but incomplete response.
Reference a policy clause that does not exist.
Respond fluently while misinterpreting the case context.

Each response may be well-formed and coherent. The issue isn’t grammar. It’s behavioural reliability.

A regression test expecting a single canonical answer doesn’t meaningfully evaluate this system.

The real question becomes:

Is the behaviour consistently within acceptable risk boundaries?

The Accuracy Illusion

Accuracy is often used as the comfort metric, but it is rarely sufficient on its own.

Depending on the use case, teams may track precision and recall, groundedness or faithfulness in RAG systems, toxicity scores for safety, or consistency measures across repeated prompts.

But a high accuracy score does not guarantee behavioural reliability. AI systems can be confidently wrong, variably correct, and operationally unstable, all while meeting benchmark targets. Accuracy is useful. It is not assurance.

A single accuracy score does not tell you:

How behaviour varies across runs
How confidently wrong the system can be
How failures propagate across components
How users interpret and act on outputs

High accuracy can coexist with low reliability.

If we reduce AI quality to a performance metric, we risk confusing measurements with assurance.

AI-Assisted Test Case Generation

Now consider a QE use case.

A quality engineering team uses an AI assistant:

“Generate boundary test cases for an e-commerce checkout API.”

Across multiple runs, the model may:

Strong coverage of payment failure scenarios
Emphasis on happy-path flows, light on error handling
Comprehensive cart validation, minimal payment edge cases
Focus on single-item purchases, missing bulk order scenarios

None of these outputs are clearly “incorrect.” Yet the reliability of the generated artefacts varies.

Regression testing assumes behaviour stays stable unless the code changes. With AI-assisted generation, that assumption simply doesn’t hold.

This introduces a new form of regression instability: coverage drift without code changes, variation in scenario emphasis across runs, and behavioural shifts following model version upgrades. Traditional regression testing assumes behavioural stability unless logic changes-AI-assisted generation violates that assumption.

In both scenarios, the system does not fail deterministically. It fails probabilistically.

That distinction changes how we evaluate quality.

When Architecture Compounds Complexity

The variability above focuses on single-model behaviour. Modern AI architecture extends this complexity further.

Retrieval-Augmented Generation (RAG) systems combine probabilistic model outputs with deterministic retrieval logic. In practice, these outcomes are influenced by very ordinary engineering decisions: chunk size selection, retrieval recall versus precision trade-offs, embedding consistency challenges as the corpus evolves, and hybrid search strategies that rebalance semantic and keyword ranking.

Output quality now depends on:

Model interpretation of the query
Retrieval ranking and document selection
Data freshness and indexing
Context window constraints
Response synthesis

Each layer introduces its own failure modes.

A model may generate a coherent response based on incomplete retrieval results. The Retrieval quality can degrade quietly as documents are added, re-indexed, or re-embedded. Outdated documents may remain indexed. Every component can behave “correctly” in isolation while the system-level outcome is unacceptable.

Failures now emerge from orchestration, not just individual defects.

As AI systems evolve toward agents that orchestrate tools and multi-step workflows, the interaction surface expands further.

This is where deterministic testing assumptions fail structurally. Verification-only approaches cannot detect failures that emerge from component interaction rather than component defects.

If we continue to apply deterministic testing models to probabilistic systems, we are systematically under-testing AI in production.

From Verification to Evaluation

Traditional testing asks:

“Is this output correct?”

AI systems force us to ask different questions:

Is this behaviour staying within defined risk boundaries?
Is variability bounded and understood?
Can we observe and classify failure modes?
Do we have measurable signals that give us confidence over time?

These aren’t variations of the same question. They require different metrics, different evaluation strategies, and often different tooling.

Verification frameworks, built to confirm that logic matches specification, are insufficient when behaviour emerges from probability and interaction rather than fixed rules.

Traditional testing verifies correctness under control.

AI quality engineering evaluates behaviour under uncertainty.

That’s not a dramatic statement. It’s a practical one.

A Different Way to Think About Quality

Here’s the structural difference:

Deterministic systems are verified. Probabilistic systems are evaluated.

We don’t eliminate uncertainty. We manage it.

AI quality is not about verifying correctness. It is about engineering confidence under uncertainty.

Engineering Confidence means:

Defining behavioural boundaries.
Measuring reliability through defensible proxy metrics.
Systematically classifying failure modes.
Monitoring drift across model and context changes.
Aligning quality thresholds with business risk tolerance.

In deterministic systems, “correct” is binary. In AI systems, “acceptable” is contextual.

An internal productivity assistant may tolerate moderate variability. A compliance advisory agent may not.

We don’t abandon traditional testing. We extend it.

Deterministic components still require verification. Model-driven behaviour requires structured evaluation.

Confidence cannot be assumed, it must be engineered.

And that work begins with understanding how AI systems actually fail: not as rare defects, but as predictable behavioural patterns emerging from probabilistic and multi-component architectures.

If we don’t understand those patterns, we’re not really testing the system; we’re relying on it to behave.

In the next article, we introduce a structured failure taxonomy: a clear classification of AI system failures across probabilistic reasoning, retrieval behaviour, and agentic orchestration.

AUTHOR:

Reach New Heights

Find out More

Get Updates

Get the latest articles, reports, and job alerts.

Watch On-Demand | Surviving the Sprint: Mitigating Risk in Oracle Cloud Updates

Francis Taylor — Tue, 10 Feb 2026 02:06:38 +0000

Events & Webinars

QualityAssuranceQualityEngineeringSecurity

Watch On-Demand | Surviving the Sprint: Mitigating Risk in Oracle Cloud Updates

Oracle's quarterly updates don't wait for manual testing. You face a strict 14-day countdown before changes are pushed to production. In this joint webinar, Planit and Tricentis demonstrate how to master this sprint and turn a risky race into a predictable process.

Stop "system creep" from turning Oracle updates into crisis events.

Watch the full webinar to see how leading organisations stabilise Oracle releases, cut testing effort and use intelligent automation to keep compliance, payroll and critical processes safe every quarter.

Oracle’s quarterly updates shouldn’t feel like a fire drill. But in heavily customised environments, with so many integrations to manage, they often do.

The culprit is “system creep”. Complexity builds up and reaches a level where manual testing simply can’t uncover everything in a 14-day window. There’s less visibility, more risk and greater chance of something going wrong.

In this joint session, we’ll share a framework for reliable, repeatable Oracle releases. Instead of a product demo, you’ll get insights from two expert partners:

Tricentis, who power AI-driven test automation across Oracle, integrations, UIs, APIs and data.
Planit, who deliver the strategic, managed service layer to make it stick.

The result is faster delivery, reduced risk and less strain on overstretched SMEs.

In this session, we'll explore:

Why Oracle testing demands a different approach in complex, integrated environments
How “system creep” quietly increases risk across compliance, payroll and data integrity
Where AI-driven, risk-based automation delivers faster releases with lower risk
Why payroll, data integrity and HMRC obligations need to be validated as end-to-end business processes instead of isolated checkboxes

Whether you’re accountable for compliance, ERP stability or digital transformation timelines, this session offers a practical path to more predictable Oracle releases.

Key Takeaways:

Understand why manual validation can’t scale in a 14-day update window
See how automation can achieve 70%+ faster testing cycles with 85%+ coverage
Learn how to build release confidence that doesn’t reset to zero every quarter
Understand what it actually takes to shift from reactive, quarter-by-quarter firefighting to a proactive assurance model that keeps you ahead of the cycle
By examining real-world enterprise case studies, discover what actually works in large-scale Oracle environments.

Your hosts:

Sriram Vaidyanathan

Senior Solution Architect

- Tricentis -

Healthy Living Resources

Rob Lunt

Director of Quality Engineering

- Planit -

Healthy Living Resources

Access the On-Demand Video

Digital Risk in Modern Energy

Francis Taylor — Fri, 06 Feb 2026 02:10:05 +0000

Whitepapers

QAQualityAssuranceQualityEngineeringTesting

Digital Risk in Modern Energy

Digital change is accelerating across the energy and utilities sector. But speed creates blind spots. This report examines where assurance maturity is falling behind and how these gaps are creating real risk.

When Assurance Falls Behind

Good assurance is all about catching problems before they reach the grid.

Unfortunately, in a sector that moves as fast as energy and utilities, it's often an afterthought.

Quality gets bolted on late. Performance testing is rare. Cyber assurance covers IT but misses OT. And when something breaks, the consequences land hard.

That’s why we’re taking a hard look at these issues in our new whitepaper, “Digital Risk in Modern Energy: The role of assurance in an increasingly digital sector”.

Drawing on insights from our Global Quality Index 2025/26, alongside research from KPMG, Deloitte, EY and the Australian Energy Regulator, we look at five areas where delivery and operational risk tend to build up, and where the sector’s readiness gaps are most acute.

Download the full whitepaper to see where those gaps are emerging.

What you'll get from the whitepaper

Gap clarity

A clearer sense of where maturity gaps tend to show up.

Industry insights

An overview of how leading providers are thinking about assurance differently.

Transformation readiness

The foundations that help support smooth digital change.

Discover where risk builds up

In this whitepaper, you’ll also get a clear view of where maturity gaps appear and how they translate into risk across five interconnected areas:

Transformation governance,
and where integration tends to break down.

Reliability
and why digital failure now hits as hard as physical outages.

Cyber readiness,
across IT, OT and customer-facing systems.

Performance engineering,
and what late-stage testing really costs.

Automation and AI,
and the growing gap between adoption and assurance.

Download Whitepaper

Engineering the Future: NZ Roadshow

Katie Cameron — Fri, 06 Feb 2026 00:59:33 +0000

Events & Webinars

Engineering the Future: NZ Roadshow

Join Planit in cities across New Zealand for an in-person, breakfast event as part of our Engineering the Future Roadshow. We'll be examining how AI and automation are shaping the next era of quality — and what it takes to turn AI ambition into real, operational advantage.

Engineering the Future Roadshow: AI, Automation and the Next Era of Quality

EVENT LOCATIONS:

Auckland, Tuesday 24 March
Christchurch, Wednesday 25 March
Wellington, Thursday 26 March

EVENT AGENDA:

9:00-9:30am – Breakfast & Networking
9:30-11am – Presentation

The recent release of Planit’s Global Quality Index 2025/26, with support from UiPath, has made one thing clear: AI and automation are no longer emerging technologies — they are now central to digital and delivery strategies, and key determinants of delivery performance, operational efficiency, and competitive advantage.

Yet the Index also reveals a growing gap between intent and impact. Many organisations are investing heavily in AI, but far fewer are successfully embedding it into their operating models in a way that is scalable, trusted, and measurable. The real differentiator is no longer who is experimenting — but who is delivering measurable impact at scale.

This in-person event series brings together industry leaders, practitioners, and innovators to explore how organisations are operationalising AI and automation to close that gap by improving efficiency, accelerating delivery, and elevating quality — without sacrificing trust, governance, or customer experience.

Hosted by Planit, sponsored by UiPath, and led by senior industry and technical experts, the series moves beyond hype and theory to focus on real-world implementation, grounded in data, experience, and practical insight. Through Global Quality Index findings, expert perspectives, and customer stories, we will examine what it takes to move from AI ambition to AI advantage.

Guest Roadshow Speakers:

Berny Roux

Digital Quality Assurance Manager

- Zespri -

Healthy Living Resources

Matt Tulloch

QE Practice Manager

- Silver Fern Farms -

Healthy Living Resources

Dolly Kaur

Director ICT - Centres of Expertise

- NZ Police -

Healthy Living Resources

Fareeha Syed

Quality Lead

- ACC -

Healthy Living Resources

Brett Hartman

Head of Digital Intelligence and Automation

- Zespri -

Healthy Living Resources

Your Local Event Details

What we'll cover

Attendees in each location can expect:

Data-led insights from Planit’s Global Quality Index 2025/26, with clear implications for delivery performance, quality, and efficiency
Real-world AI and automation use cases showing how organisations are moving beyond pilots to scalable, trusted outcomes
Expert perspectives from senior industry and technical leaders on what works, what doesn’t, and how to operationalise AI at scale
Customer stories and lessons learned from organisations closing the gap between AI ambition and measurable impact
Actionable takeaways to help accelerate delivery, improve efficiency, and elevate quality without compromising trust or customer experience

Auckland Specific Details

Watch On-Demand I Security Breaches: How They Happen and How to Stop Them

Grace Cody — Wed, 28 Jan 2026 04:01:49 +0000

Events & Webinars

QualityAssuranceQualityEngineeringSecurity

Watch On-Demand I Security Breaches: How They Happen and How to Stop Them

Behind every security breach is a chain of failures that could have been prevented. In this joint Planit and NRI on-demand webinar, our security leaders examine real-world incidents from two perspectives—revealing exactly how vulnerabilities were exploited and the practical controls that could have stopped them in their tracks.

Gain practical security insights from both sides of the breach.

Want a closer look at what we covered? Access the full presentation deck to revisit the key insights, data points, and recommendations shared by our experts.

From healthcare and aviation to higher education and infrastructure, recent breaches across Australia and New Zealand have shown just how quickly quality gaps and security blind spots can escalate into serious, high-profile incidents.

Drawing on real-world cases from the last 18 months, Planit and NRI will provide two essential perspectives. From the “red team” side, we’ll explore how vulnerabilities are discovered and exploited; from the “blue team” side, we’ll examine the governance, architecture and assurance practices required to reduce risk.

This is a valuable session for security, risk and technology leaders who want concrete lessons they can apply immediately, before the next incident hits.

What You'll Learn:

Details about recent high-impact security breaches across Australia and New Zealand, spanning healthcare, aviation and education
How vulnerabilities are identified and exploited in real environments (red team perspective)
Where security controls commonly break down across modern technology stacks
How organisations can design, implement and validate effective security controls earlier in the lifecycle (blue team perspective)
Practical lessons that connect offensive security insight with defensive design and assurance

Across four major incidents, we’ll examine the differing root causes, attack paths and downstream effects—from reputational damage to regulatory consequences and community impact.

Key Takeaways:

A clear understanding of how real-world breaches occur, beyond surface-level reporting
Insight into the most common security gaps seen across modern platforms and architectures
Practical guidance on strengthening security design, testing and assurance practices
An expert look at how to catch vulnerabilities before they escalate into incidents
Actionable lessons you can apply immediately to reduce risk in your own organisation

Your hosts:

David Hawks

Head of Cyber Security

- NRI -

Healthy Living Resources

Ferd Hagethorn

Practice Director - Security Services

- Planit -

Healthy Living Resources

Access the On-Demand Video

12 Quality Controls for Core Banking Transformations

Francis Taylor — Mon, 26 Jan 2026 23:44:32 +0000

Whitepapers

QualityAssuranceQualityEngineering

12 Quality Controls for Core Banking Transformations

A guide to turning high-risk core replacement into governed, evidence-based delivery.

Replacing a core banking platform is one of the most challenging and high-risk initiatives a bank can undertake. Without careful governance, these programs frequently suffer from cost blowouts, regulatory failures and reputational damage.

However, these risks can also be mitigated with the proper controls.

In this whitepaper, our experts set out 12 critical risk controls to turn this high-stakes transformation from a “crossed-fingers aspiration” into a secure, evidence-based delivery.

Drawing on Planit’s experience across major banking transformations and insights from the Global Quality Index 2025/26, this paper identifies disciplined quality controls that can reduce uncertainty, protect financial integrity and preserve your institution’s record of truth.

Download the whitepaper to learn:

The recurring design pitfalls that can increase program costs by up to 40%.
How to leverage AI to accelerate testing efficiency and reduce delivery overheads.
Strategies for ensuring regulatory compliance, data migration integrity and operational continuity.

AUTHORS:
Sarah Thomas
Principal Consultant – Quality Engineering

Jeff Gonzales
Principal Consultant – Quality Engineering

Download Whitepaper

5 Risks You Can’t Ignore in D.I.Y. Quality

Francis Taylor — Wed, 17 Dec 2025 21:55:13 +0000

Articles

QAQualityAssuranceQualityEngineeringTesting

5 Risks You Can’t Ignore in D.I.Y. Quality

In-house quality engineering can drain budgets, slow down delivery and leave you open to risk. Know the true costs before you start — and why leading businesses are choosing partnerships instead.

Taking your quality engineering in-house can be a risky gamble.

It might start out as a well-intentioned effort to save money, but it often spirals into operational bottlenecks, delayed releases and skill gaps your team can’t fill.

Worse, there’s a range of hidden pitfalls that can sabotage your efforts before you even begin.

Fortunately, we’ve put together this handy infographic to help you steer clear of the 5 biggest risks of relying on D.I.Y. Quality.

Download it now to discover:

The hidden costs that make in-house quality engineering so expensive.
How internal teams fall behind agile releases and DevOps cycles.
The difference between “maintenance mode” and true innovation.
The reason so many businesses struggle to find skilled testers.
How the right partnership can deliver measurable savings.

Download Infographic

PEAK Matrix® Assessment 2025 – Planit recognised as a Leader and Star Performer in Quality Engineering Specialist Services

Francis Taylor — Mon, 15 Dec 2025 04:15:00 +0000

Articles

AnalystReviewQualityEngineeringSoftwareTesting

PEAK Matrix® Assessment 2025 – Planit recognised as a Leader and Star Performer in Quality Engineering Specialist Services

Everest Group has named Planit a Leader and Star Performer in their Quality Engineering (QE) Specialist Services PEAK Matrix® Assessment 2025.

15 December 2025

Everest Group has recognised Planit as both a Leader and Star Performer in their Quality Engineering (QE) Specialist Services PEAK Matrix® Assessment 2025.

The PEAK Matrix® provides an objective, data-driven assessment of service and technology providers based on their overall capability and market impact across different global services markets, classifying them into three categories: Leaders, Major Contenders and Aspirants.

For this year’s QE Specialist Services report, the Everest Group benchmarked 21 providers with a dedicated strategic focus on end-to-end quality engineering services.

As part of this assessment, Planit was named a Star Performer, a designation given to providers that have shown the most year-over-year improvement in their position on the PEAK Matrix®.

Such strong positioning reaffirms our ability to deliver high-quality engineering solutions, enabling businesses to enhance software quality, achieve faster time-to-market and meet evolving customer demands.

You can access the full report here.

Planit was also named a Major Contender in Quality Engineering (QE) for mid-market enterprises in the PEAK Matrix® Assessment 2024, where we were recognised for the value we delivered, our vision and strategy, scope of services offered, innovation and investments, and delivery footprint.

About Everest Group:

Everest Group is a leading research firm helping business leaders make confident decisions. They guide clients through today’s market challenges and strengthen their strategies by applying contextualised problem-solving to their unique situations. Their deep expertise and tenacious research focus on technology, business processes, and engineering. Visit www.everestgrp.com to know more about the company and its offerings.

About Planit:

Planit is a global leader in Quality Engineering, providing innovative solutions to drive digital transformation and ensure the delivery of high-quality software. With a team of experienced professionals and a strong focus on customer satisfaction, Planit provides businesses with the vision, precision, and independence they need to improve the quality of their software and the way they deliver it.

Deliver Quality Quicker

In today’s competitive landscape, organisations expect to deliver more ambitious technical outcomes at improved efficiency. We can help you achieve these goals by embedding quality throughout the lifecycle, optimising your delivery to improve outcomes, accelerate speed, and decrease cost.

Find out how we can help you mature your quality engineering practices to consistently achieve better results with greater efficiency.

Find out More

Get Updates

Get the latest articles, reports, and job alerts.

Planit & Specsavers Win Best Test Project in Retail at the European Software Testing Awards 2025

Francis Taylor — Tue, 02 Dec 2025 06:21:59 +0000

Articles

DigitalQualityQualityEngineeringSoftwareTesting

Planit & Specsavers Win Best Test Project in Retail at the European Software Testing Awards 2025

Planit and Specsavers prove quality engineering can deliver meaningful impact for real customers, with award judges praising the entry's "authenticity, empathy and social value."

Planit UK, in partnership with Specsavers, has been recognised at the European Software Testing Awards 2025, taking home the prestigious Best Test Project in Retail.

The awards celebrate the very best in software quality across the UK and Europe, with judges assessing entries for innovation, value, and real-world impact.

In addition to winning the award, our partnership also received a Special Commendation, awarded only to submissions demonstrating exceptional authenticity and social value. As the judges noted:

“This entry demonstrated how quality engineering can improve real-world outcomes. The team prioritised user experience, accessibility, and reliability over hype. The results felt real, with measurable outcomes. This one stands out for authenticity, empathy, and social value. Innovation doesn’t always require bleeding-edge technology, just intelligent design and genuine care.”

This honour reflects not just the impact of the solution, but the strength of a partnership built on trust, collaboration, and a shared focus on improving real experiences for customers and teams. It highlights the difference that two organisations can make when aligned in purpose and delivery.

Why our project stood out

The Best Test Project in Retail category is one of the most competitive categories in these awards, recognising work that elevates customer experience, strengthens operational resilience, and delivers tangible business benefits making it.

The winning project recognised by ESTA centred on Specsavers’ Self Refraction programme, where Planit helped deliver a more intuitive, accessible, and reliable eye-testing experience for customers.

From day one, the partnership between Planit and Specsavers was anchored in a simple belief: clinical innovation must never lose sight of the human element. Every decision (every test, every scenario, every refinement) was shaped around real people and real outcomes. By simulating real store conditions, capturing authentic user interactions, and using immersive customer and optometrist personas, the team was able to see the experience exactly as people would live it.

The result was a smoother, faster eye-test journey; increased appointment capacity; stronger clinical trust; and a noticeable uplift in customer satisfaction.

Why this win matters

This award reflects both organisations’ shared belief that meaningful innovation begins with understanding real people and their needs.

For Planit, it reinforces our ability to design assurance strategies rooted in real behaviour and deliver solutions that are safe, accessible, resilient, and genuinely impactful.

It also demonstrates our capability to partner deeply with clients, shaping outcomes that enhance customer trust, strengthen operational confidence, and create long-term value.

Looking ahead, Planit remains focused on helping UK organisations strengthen resilience, improve customer experience, and deliver services that work reliably in demanding environments. The success of this award-winning project with Specsavers underlines the value of getting quality right from the outset.

If you’re seeking a partner with a proven track record across complex retail and healthcare settings, we’d welcome a conversation.

Deliver Quality Quicker

Find out More

Get Updates

Get the latest articles, reports, and job alerts.

Planit

Understanding how AI systems fail: A layered failure taxonomy

Understanding how AI systems fail: A layered failure taxonomy

AI systems are layered

Why four layers?

Mini case study

Failure propagation across layers

Why this taxonomy actually matters

Where did this behaviour originate?

A candid observation

What comes next

AUTHOR:

Manoj Kumar Kumar

Reach New Heights

Get Updates

Beyond deterministic testing: Why testing AI systems is fundamentally different

Beyond deterministic testing: Why testing AI systems is fundamentally different

The Structural Assurance Gap

The Testing Model We Know and its Limits

Where “Correct” Is No Longer Binary

“Why was my refund rejected?”

“Generate boundary test cases for an e-commerce checkout API.”

When Architecture Compounds Complexity

From Verification to Evaluation

“Is this output correct?”

A Different Way to Think About Quality

AI quality is not about verifying correctness. It is about engineering confidence under uncertainty.

In deterministic systems, “correct” is binary. In AI systems, “acceptable” is contextual.

AUTHOR:

Manoj Kumar Kumar

Reach New Heights

Get Updates

Watch On-Demand | Surviving the Sprint: Mitigating Risk in Oracle Cloud Updates

Watch On-Demand | Surviving the Sprint: Mitigating Risk in Oracle Cloud Updates

Stop "system creep" from turning Oracle updates into crisis events.

In this session, we'll explore:

Your hosts:

Sriram Vaidyanathan

Senior Solution Architect

Rob Lunt

Director of Quality Engineering

Access the On-Demand Video

Digital Risk in Modern Energy

Digital Risk in Modern Energy

When Assurance Falls Behind

What you'll get from the whitepaper

Gap clarity

Industry insights

Transformation readiness

Discover where risk builds up

Download Whitepaper

Engineering the Future: NZ Roadshow

Engineering the Future: NZ Roadshow

Engineering the Future Roadshow: AI, Automation and the Next Era of Quality

Guest Roadshow Speakers:

Berny Roux

Digital Quality Assurance Manager

Matt Tulloch

QE Practice Manager

Dolly Kaur

Director ICT - Centres of Expertise

Fareeha Syed

Quality Lead

Brett Hartman

Head of Digital Intelligence and Automation

Your Local Event Details

What we'll cover

Watch On-Demand I Security Breaches: How They Happen and How to Stop Them

Watch On-Demand I Security Breaches: How They Happen and How to Stop Them

Gain practical security insights from both sides of the breach.

What You'll Learn:

Your hosts:

David Hawks

Head of Cyber Security

Ferd Hagethorn

Practice Director - Security Services

Access the On-Demand Video

12 Quality Controls for Core Banking Transformations

12 Quality Controls for Core Banking Transformations

Download Whitepaper