Building the Diskover MCP Connector With AI: From Idea to Shipping Product

Reece Poirier — Thu, 05 Mar 2026 00:46:22 +0000

March 4, 2026

Building the Diskover MCP Connector With AI: From Idea to Shipping Product

AI is often framed as a productivity tool – something that helps you write faster, brainstorm ideas, or automate small tasks. But what happens when AI is responsible for building an actual product?

In a recent Medium post, Olivier Rivard, VP of Product at Diskover Data, walks through exactly that: how he built a production-ready Model Context Protocol (MCP) connector for Diskover using AI agents from start to finish.

Not a demo.
Not a proof of concept.
A real product that will ship to customers.

This summary highlights the key ideas and why they matter for how modern products get built.

From AI assistant to AI product team

Olivier’s goal wasn’t to “use AI for coding.” It was to see whether AI could operate as a cross-functional product team spanning product management, engineering execution, and quality review.

Using MCP (an open standard that allows LLMs to securely interact with real systems), AI agents were connected directly to tools Diskover already uses, including Jira and GitHub. This meant the AI wasn’t working in isolation – it could read live project data, create issues, follow workflows, and implement real features.

The result: a fully AI-assisted workflow that went from product requirements to deployed code.

Step 1: Defining product requirements with AI

Instead of starting with a blank document, Olivier defined how the AI should behave – effectively turning it into a Senior Product Management Assistant.

With access to Jira via MCP, the AI could:

Understand Diskover’s existing templates and standards
Create properly scoped Epics and stories
Generate acceptance criteria
Populate the development board automatically

This wasn’t documentation for documentation’s sake. It produced a clean, executable roadmap that engineering could immediately act on.

Step 2: Scoping the first version

Rather than trying to do everything, the first version of the MCP connector focused on a small set of high-value Diskover capabilities, including:

Searching files using Diskover’s metadata filters
Tagging files and folders for organization and automation
Retrieving key metrics and usage statistics
Listing recently indexed datasets

This narrow scope delivered immediate value while keeping complexity low – a classic product principle, now executed with AI.

Step 3: Creating the full Jira breakdown with Claude

For each feature, the AI:

Asked clarifying questions
Drafted a complete Jira story with acceptance criteria
Created the issue under the correct Epic

By the end of this step, the early roadmap for the MCP connector lived entirely inside Jira – generated and structured by AI, but aligned with Diskover’s real development process.

Step 4: Implementing every feature with AI

AI didn’t stop at planning. It implemented the features too.

Claude Code handled development tasks by pulling Jira issues directly and writing code. ChatGPT Codex was used as a reviewer, flagging bugs, edge cases, and optimization opportunities. Olivier then validated, tested, and guided fixes – acting as the final quality gate.

The human role shifted from “doing the work” to directing and validating the work.

A new model for building software

One of the clearest takeaways from this experiment is that AI-driven development doesn’t eliminate the need for product or technical leadership – it amplifies it.

Success depended on:

Clear, unambiguous product requirements
Strong intuition about user workflows
The ability to review and validate code written by others (human or AI)

In other words, experience still matters. The tools changed, the responsibility didn’t.

Read the full story

This summary only scratches the surface. Olivier’s full Medium post goes deeper into:

The exact AI workflows used
Lessons learned along the way
Why this approach is reshaping how Diskover thinks about product development

Read the full post on Medium: How I’m Building the Diskover MCP Connector Entirely With AI

If you’re curious about what product development looks like in an AI-first world, it’s worth your time.

The post Building the Diskover MCP Connector With AI: From Idea to Shipping Product appeared first on .

AI Is Driving a Memory and Storage Crunch and Efficiency Will Decide Who Keeps Moving

Reece Poirier — Fri, 30 Jan 2026 13:47:29 +0000

January 30, 2026

AI Is Driving a Memory and Storage Shortage and Efficiency Will Decide Who Keeps Moving

Artificial intelligence is changing how organizations operate, compete, and innovate. But behind the rapid adoption of AI sits a growing infrastructure challenge that’s becoming harder to ignore: AI is consuming memory and storage at a pace the industry cannot easily sustain.

As AI models grow larger and workloads become more persistent, they require massive amounts of memory to stay performant and fast storage to keep data moving. At the same time, those same resources are still required to run everyday business systems, from file services and analytics to backups, archives, and compliance workflows.

The result is an emerging memory and storage shortage that threatens not only AI initiatives, but core business operations as well.

Why This Shortage Is Different

The technology industry has seen supply shortages before. What makes this moment different is where demand is coming from and how it’s reshaping supply.

Recent reporting shows that AI infrastructure is absorbing a disproportionate share of global memory production, particularly high-bandwidth and server-class memory. In some segments, vendors are already fully allocated well into next year, driving sharp price increases and extended lead times.
(CNBC)

Even as manufacturers expand fabrication capacity, production is being redirected toward AI workloads rather than evenly expanded across markets. This is not a temporary imbalance, it represents a structural shift in how memory and storage resources are allocated in an AI-driven world.

Memory vs. Storage: Why Both Are Under Pressure

Memory and storage are often discussed together, but they play different roles and both are affected by the same upstream constraints.

Memory (DRAM and high-bandwidth memory) is used to actively process data. It powers caches, metadata services, analytics, AI pipelines, and model execution. The more data you keep active or frequently accessed, the more memory your systems require.

Storage (flash and disk) holds data at rest, but modern storage systems rely heavily on memory to operate efficiently. Enterprise storage arrays use large amounts of DRAM for caching, indexing, metadata handling, and performance optimization.

Both memory and storage components are manufactured from the same foundational resource: silicon wafers. As more of that silicon is allocated to AI-optimized memory, fewer wafers are available for general-purpose DRAM and flash used across enterprise infrastructure. That zero-sum dynamic means pressure on memory inevitably cascades into storage systems, cloud platforms, and everyday IT operations.

The Pressure Extends Far Beyond AI Teams

This shift isn’t confined to hyperscalers or research labs. As AI infrastructure consumes more silicon capacity, the effects ripple across the entire technology ecosystem.

Industry analysts note that memory and storage shortages are now influencing pricing, refresh cycles, and availability across enterprise systems, PCs, and endpoint devices. What began as an AI infrastructure challenge is reshaping planning assumptions across multiple markets.
(IDC)

For enterprises, this means AI projects don’t exist in isolation. They compete directly with core business workloads for the same memory and storage resources, increasing operational risk if infrastructure efficiency isn’t addressed proactively.

The Hidden Amplifier: Uncontrolled Data Growth

While hardware constraints get the headlines, unstructured data sprawl quietly magnifies the problem.

Across most organizations:

The majority of stored data is unstructured
Much of it is duplicated, rarely accessed, or no longer relevant
Yet it still consumes premium storage and memory through indexing, caching, and analytics

As AI workloads expand, inefficient data management becomes a direct memory problem. Feeding models and pipelines with poorly curated data increases memory pressure and storage consumption without improving outcomes.

As analysts have noted, AI workloads are fundamentally built around memory, and that demand cannot simply be dialed back without impacting performance. That makes upstream data efficiency even more critical.
(NPR)

Five Ways to Use Memory and Storage More Efficiently

Organizations can’t control global supply chains, but they can control how efficiently they use what they already have.

1. Gain Visibility Across All Data

You can’t manage memory or storage efficiently if you don’t know what data exists, where it lives, or how it’s used. Visibility is the foundation for reducing both capacity waste and memory overhead.

2. Align Storage and Memory Usage With Data Value

Not all data needs to stay active. Reducing the amount of data that must remain hot lowers storage costs and the memory required to process, cache, and analyze it.

3. Remove Redundant and Orphaned Data

Duplicate files, abandoned projects, and outdated datasets quietly consume capacity while increasing memory usage during scans, indexing, and AI ingestion.

4. Automate the Data Lifecycle

Manual cleanup doesn’t scale. Policy-driven automation ensures data moves, ages, or retires based on business rules, reducing long-term pressure on both memory and storage systems.

5. Curate Data Before It Feeds AI

AI models don’t need more data, they need better data. Every unnecessary file introduced into an AI pipeline increases memory requirements during training, inference, and retrieval, amplifying the very shortages organizations are trying to avoid.

A Data-Centric Path Forward

The memory and storage crunch driven by AI isn’t going away. Even as new manufacturing capacity comes online, demand continues to grow faster and priorities have permanently shifted.

Organizations that treat this as a temporary pricing issue will remain reactive. Those that treat it as a data management challenge will stay resilient and protect their return on infrastructure investments.

In an environment of rising memory and storage costs, ROI is no longer just about buying the right hardware. It’s about extending the value of what you already own, delaying expensive upgrades, and ensuring high-performance resources are reserved for workloads that actually drive the business forward.

Diskover helps organizations take that data-centric approach. By delivering global visibility into unstructured data, rich metadata insights, and automated lifecycle controls, Diskover enables teams to:

Reduce waste across existing storage, avoiding unnecessary capacity expansion
Lower memory pressure by minimizing duplicated, inactive, or poorly curated data
Preserve high-performance infrastructure for critical AI and business workloads
Delay hardware refreshes and cloud spend, improving ROI on current investments
Support AI initiatives without destabilizing day-to-day operations

In an era where memory and storage are becoming strategic constraints, efficiency isn’t just about cost control. It’s about maximizing ROI, protecting infrastructure budgets, and ensuring the business can continue to operate, innovate, and grow, even as AI reshapes the economics of IT.

Ready to structure the unstructured?

Let’s talk.

The post AI Is Driving a Memory and Storage Crunch and Efficiency Will Decide Who Keeps Moving appeared first on .

Is Your Data AI-Ready? Most Enterprises Aren’t – Here’s Why

Reece Poirier — Thu, 15 Jan 2026 18:37:31 +0000

January 15, 2026

Is Your Data AI-Ready? Most Enterprises Aren’t – Here’s Why

Generative AI promises transformational gains, faster insights, automated decision support, and the ability to unlock value from years of digital information. But there’s a hard truth enterprises are now confronting: most of their data simply isn’t ready for AI.

Not because they lack data. But because they lack visibility, structure, and control, especially across the sprawling, unstructured data estates powering today’s businesses.

Unstructured data has become the foundation of AI, yet it’s also the hardest to wrangle. Files, images, videos, logs, documents, design assets, sensor output – these assets sit scattered across systems, clouds, and archives. Without a clear strategy for discovering, organizing, and preparing them, even the most ambitious AI initiatives stall before they start.

Below are the six most common reasons enterprises struggle with AI readiness and what organizations can begin doing today to close the gap.

1. Siloed Storage Is Sabotaging AI Initiatives

Most enterprises store data everywhere: NAS, object storage, cloud buckets, on-prem archives, remote offices, legacy systems, user drives – the list expands every year. These silos made sense when teams worked independently. But AI depends on unified visibility and consistent access, which these fragmented systems cannot provide.

When no one can answer basic questions like “Where does this dataset live?” or “How many versions of this asset exist?” – AI pipelines grind to a halt.

Where to start:

Inventory all storage systems and repositories
Document which teams rely on which platforms
Identify redundant systems and legacy environments that no longer support modern workflows
Encourage movement toward shared, standardized data access patterns

2. You Don’t Know What Data You Have or Whether It Matters

Organizations are sitting on thousands to billions of files, but lack insight into what’s active, critical, duplicated, sensitive, or junk. And without that visibility, AI efforts begin with guesswork rather than strategy.

This leads to overspending on storage, slow data retrieval, and an inability to prioritize the datasets most likely to fuel AI value.

Where to start:

Implement tagging (manual or scripted) based on file attributes
Remove obvious redundancies, temp files, duplicate content
Work with finance to quantify storage cost by tier or repository
Build a simple classification model (Active / Archive / Delete) to begin segmenting datasets

3. Cold and Dormant Data Is Consuming Expensive Storage

Inactive files often sit on the most expensive storage tiers, sometimes for years. These assets slow down scans, backups, migrations, and AI data preparation. Worse, they clog infrastructure that should be optimized for high-value, frequently accessed data.

AI workloads require fast, curated, context-rich datasets, not a mountain of stale archives.

Where to start:

Flag files not accessed in the last 6–12 month
Move inactive data to lower-cost storage tiers
Review old logs, outdated backups, duplicates, and abandonware
Work with business units to align retention with actual value

4. Manual Data Processes Can’t Keep Up with AI

File movement. Folder cleanup. Tagging. Classification. Data syncing. Lifecycle management.

When datasets hit petabyte scale, manual processes collapse. And every hour spent manually preparing files is an hour not spent building or training AI models.

To meet AI’s velocity, enterprises need automated workflows, policy-driven actions, and continuous metadata enrichment.

Where to start:

Automate repetitive tasks like cleanup, tagging, and archival
Centralize ownership for automation initiatives in a focused ops team
Evaluate platforms for API-driven or rules-driven automation
Pilot small workflow automations to prove value and build momentum

5. AI Teams Are Spending Most of Their Time Prepping Data

Data scientists are hired to innovate, but many spend 60–70% of their time hunting for files, deciphering naming conventions, massaging inconsistent formats, or filtering low-value data from massive file collections.

This not only delays AI projects; it reduces accuracy, slows iteration, and frustrates the teams you hired to accelerate progress.

Where to start:

Centralize documentation for datasets
Enforce naming standards across the organization
Assign data stewards to high-impact domains
Build a searchable internal catalog for known datasets

6. Your Data Architecture Still Isn’t Designed for AI

AI isn’t something you “bolt on” to existing systems. It relies on a data architecture capable of:

high-throughput ingestion
fast indexing
metadata enrichment
flexible data mobility
consistent governance
scalable curation

Without these foundations, organizations may have terabytes or petabytes of unstructured data, but none of it is ready for intelligent use.

Where to start:

Map friction points in your current AI workflows
Define your ideal end-to-end data pipeline
Allocate resources for data readiness (not just AI tools)
Align IT, engineering, and AI teams around a shared data strategy

Closing the Gap with Diskover: Structure the Unstructured

While these steps help organizations begin improving AI readiness, truly unlocking unstructured data at scale requires indexing, visibility, context, and orchestration, all working together.

Indexing and discovering all unstructured data across storage, clouds, and archives

Enriching metadata with business context for powerful searchability and AI curation

Providing a unified, searchable view of even the most complex data estates

Automating lifecycle management, tiering, and dataset preparation

Orchestrating data workflows for AI pipelines, data lakes, and Snowflake/Openflow integrations

Identifying high-value, redundant, stale, or orphaned files with precision

Diskover helps enterprises stop guessing and start strategically preparing their unstructured data so AI teams can move faster and build better models using datasets that are accurate, complete, and context-rich.

If your enterprise is ready to finally get control of unstructured data and make your data truly AI-ready, Diskover can help you get there.

Ready to structure the unstructured?

Let’s talk.

The post Is Your Data AI-Ready? Most Enterprises Aren’t – Here’s Why appeared first on .

Unlocking the Power of Unstructured Data with Diskover on the RiVA Platform

Reece Poirier — Mon, 05 Jan 2026 16:14:25 +0000

January 5, 2026

Unlocking the Power of Unstructured Data with Diskover on the RiVA Platform

For Exploration & Production (E&P) companies, unstructured data is both an invaluable asset and an operational challenge. Today’s seismic, geological, and subsurface data lives everywhere – across on-premise systems, cloud repositories, aging archives, and project-specific platforms. Without a way to unify it, search it, and trust it, teams lose time, miss insights, and struggle to maintain compliance.

That’s why GeoComputing Group and Diskover Data have partnered to deliver a modern approach to unstructured data intelligence inside the GeoComputing RiVA platform – enabling organizations to finally gain global visibility, control, and real-time understanding of their geoscience data.

A Powerful Partnership: RiVA and Diskover

RiVA is well known for delivering high-performance private cloud environments purpose-built for geoscience workflows. By adding Diskover’s global metadata indexing, search, and workflow automation, RiVA evolves into a unified data intelligence platform.

With Diskover embedded into RiVA, organizations gain:

A single, searchable view of all unstructured data – across on-prem, the GeoComputing Cloud, commercial cloud storage, and legacy archives.
Lightning-fast discovery with rich metadata, enabling engineers, geologists, and data managers to find exactly what they need in seconds.
Automated curation and cleanup to prepare datasets for AI, analytics, and compliance workflows.

The result is a streamlined environment where petabytes of unstructured files become accessible, meaningful, and ready for action.

Customer Spotlight: Finding the Unfindable – Fast

“Our client had divested a permit and needed to find all related data to handover and then remove from disk. They provided a list of prospects, seismic surveys and aliases which Diskover could use in a single search to very quickly scan the entire RiVA filesystem for matching references and provide file/folder listings and sizes. This task was completed in less than an hour. It would have taken weeks and needed extensive historical knowledge without Diskover.“

–Tim Ballinger, GeoComputing Group LLC

Where This Matters: Real Use Cases for E&P Teams

Across the energy sector, Diskover + RiVA accelerates mission-critical workflows that have traditionally depended on slow manual searches or tribal knowledge. Examples include:

Preparing Datasets for Generative AI & Analytics
- Curate AI-ready corpuses by filtering for the most relevant seismic volumes, well files, geologic reports, or project tags. Automate movement to GPU-backed environments in RiVA.
Divestiture & M&A Data Readiness
- Quickly locate all assets tied to a basin, block, permit, or survey – including hidden references or legacy folder structures. Export, package, and hand off clean datasets in hours instead of weeks.
Seismic Data Quality, Cleanup & Reconciliation
- Identify duplicates, outdated SEG-Y versions, or abandoned project folders. Automatically tag data by survey, operator, or vintage to maintain clean, trusted libraries.
Archive Modernization & Legacy Data Recovery
- Surfacing forgotten or inaccessible archives – tape migrations, old shares, orphaned user directories – becomes fast and predictable using Diskover’s metadata-driven indexing.
Compliance, Governance & Data Lifecycle Management
- Track where regulated datasets live, how they’re being used, and when they should be archived or deleted. Automatically enforce retention policies across environments.

These use cases deliver immediate operational value, particularly for teams under pressure to move faster, reduce storage waste, and prepare for AI-driven exploration workflows.

Ready for the Future

GeoComputing’s latest generation of RiVA brings AI-ready GPUs, containerized deployments, and seamless integration with modern geoscience applications. Combined with Diskover’s scalable indexing, deep metadata enrichment, and automated workflows, E&P organizations gain a platform built for the next decade of digital subsurface innovation.

Together, GeoComputing Group and Diskover Data are setting a new standard for energy data management – empowering teams to unlock more value from their data, accelerate decision-making, and support sustainable growth.

Learn more about Diskover Data

The post Unlocking the Power of Unstructured Data with Diskover on the RiVA Platform appeared first on .

From Whiteboard to Workflow: How Diskover Built Its Openflow Connector for Snowflake

Reece Poirier — Mon, 15 Dec 2025 22:27:51 +0000

December 15, 2025

From Whiteboard to Workflow: How Diskover Built Its Openflow Connector for Snowflake

Summary of our recent article published on the Snowflake Medium blog

When Diskover partnered with Snowflake, one of the earliest engineering priorities was clear: build a scalable, low-code pipeline to stream unstructured file metadata and storage analytics directly into Snowflake. That challenge became the foundation of the Diskover Openflow connector, designed by engineer Byron Rakitzis, and it’s now emerging as a powerful new ingestion path for enterprises preparing their data for AI.

Why Openflow?

Traditional ingestion methods rely on custom-built JDBC connectors – flexible, but difficult to scale, maintain, and deploy. Diskover needed a different approach. Snowflake Openflow provided a low-code, processor-driven framework that handled much of the orchestration behind the scenes, while Kafka acted as the durable, future-proof transport layer.

This allowed Diskover to:

Scale ingestion without custom code, using Openflow’s built-in orchestration
Standardize on Kafka, enabling future expansion to other warehouses with Kafka consumers
Accelerate onboarding, even for customers without deep engineering teams

What the Pipeline Handles

Diskover’s connector processes two major data flows:

Quota records: storage allocation and usage over time
File metadata: billions of filenames, paths, timestamps, sizes, and owners indexed by Diskover

Openflow organizes these flows into parallel branches on the canvas, separates metadata-only records from those requiring additional handling, and streams the results into Snowflake using Snowpipe Streaming.

Lessons Learned While Building It

As Byron built the first working version, a few themes emerged:

Low-code still requires engineering discipline.
- Each processor needs careful configuration – thread counts, schemas, partitioning strategies. The visual workflow hides orchestration, but not design responsibility.
Error messages can be…cryptic.
- Some Openflow errors were vague or misleading. Debugging often required digging into SQL roots or using AI tools to interpret internal messages.
Scaling is about design, not just settings.
- Kafka partitioning proved essential. Too few partitions bottleneck the pipeline; too many add overhead. Openflow helps surface bottlenecks, but thoughtful upfront design matters most.
Automate everything possible.
- To avoid manual copy-paste errors across dozens of processors, Byron built a Go-based code generator to produce consistent JSON, SQL, and YAML configs, a key step toward repeatability.

Practical Advice for Teams Considering Openflow

Start small and iterate. Build a functional pipeline before stressing it with scale.
Leverage the Apache NiFi community. Openflow concepts closely follow NiFi, and existing documentation is invaluable.
Automate configs early. Avoid manual parameter editing at all costs.
Expect tuning and refinement. Openflow accelerates development, but production pipelines still require thoughtful engineering.

What’s Next

The connector is fully functional today, with beta deployments beginning soon. These real-world environments will answer open questions around autoscaling, throughput, and handling more complex metadata or blob-level extraction.

Long term, the architecture positions Diskover to extend similar pipelines to other cloud warehouses, because Kafka remains at the core, the pipeline can evolve without major redesign.

Read the full post on Medium

The full Medium article includes deeper technical detail, pipeline visuals, snippets from our engineering interview with Byron, and a look at what’s ahead.

Want to Learn More?

See how Diskover is partnering with Snowflake and powering AI-ready unstructured data pipelines.

The post From Whiteboard to Workflow: How Diskover Built Its Openflow Connector for Snowflake appeared first on .

The Power of AI Natural Language Interfaces: Transforming How We Work with Data

Reece Poirier — Fri, 07 Nov 2025 18:04:37 +0000

November 7, 2025

The Power of AI Natural Language Interfaces: Transforming How We Work with Data

For years, the way we’ve interacted with data has depended on specialists — people fluent in query languages, dashboard tools, and systems built for experts. That model made sense when data lived in tidy databases. But as unstructured data — files, documents, videos, logs, and images — has exploded, so has the complexity of finding, understanding, and using it.

Now, a new wave of AI-powered natural language interfaces (NLIs) is redefining what it means to be “data-driven.” These conversational systems allow anyone to ask questions and get answers from across vast data ecosystems — no SQL, no scripting, no waiting for IT.

From Dashboards to Dialogue

Most analytics and storage tools were built for specialists. Even the friendliest dashboard still assumes you know which metrics to look for and where to find them. Natural language interfaces change that dynamic.

Instead of navigating reports or building queries, users can simply ask:

“Where are we spending the most on cold storage?”
“Which datasets haven’t been accessed in six months?”
“Summarize data growth trends over the past quarter.”

Behind the scenes, AI interprets intent, retrieves relevant metadata, and responds conversationally — turning what was once a technical task into a human one.

Why This Matters Now

Two trends are converging to make natural language interfaces essential rather than optional.

First, data complexity is growing faster than our ability to manage it. Unstructured data now represents an estimated 80–90% of what most organizations store. It’s scattered across on-prem systems, clouds, and SaaS platforms, each with its own permissions, formats, and costs.

Second, AI fluency is spreading across the enterprise. Large language models have made natural language the new operating system of work. From summarizing documents to generating code, employees are learning to ask better questions — and expect faster answers.

The logical next step is to bring that same simplicity to how we explore and manage data itself.

The Human Side of Data Interaction

Natural language doesn’t just make data more accessible — it changes how organizations think. When anyone can ask a question and get an answer instantly, curiosity replaces gatekeeping. Teams experiment more freely, explore ideas faster, and spot inefficiencies sooner.

In practice, this shift is already underway. With tools like Diskover’s AI Data Assistant, teams can move beyond static reports to true interaction with their data. A data engineer might surface files that meet multiple conditions — such as large media assets over 2 GB that haven’t been accessed in six months — and instantly tag them for archival. A project manager could create a dashboard summarizing data growth by storage system, simply by describing it in plain language. Or an operations lead could search across millions of files, apply filters, and trigger automated workflows to route content to the right storage tier — all through a conversational interface.

These capabilities illustrate how AI-driven natural language tools are blurring the line between insight and action. By connecting intuitive commands to powerful metadata intelligence, organizations can empower every user to manage and optimize data with the same ease as asking a question.

Challenges on the Road Ahead

Despite the excitement, natural language interfaces are not magic. Organizations will need to confront several realities:

Context matters. AI must understand not just language, but data lineage, ownership, and relevance.
Accuracy and transparency remain critical — users need to trust where answers come from.
Governance and access control still apply; conversational doesn’t mean unsecured.
Metadata quality will determine the quality of results. Without rich, structured metadata, AI can’t deliver meaningful answers.

In other words, natural language isn’t a shortcut around data management — it’s a reason to do it better.

A New Era of Data Interaction

Natural language interfaces represent a shift as significant as the move from command lines to graphical interfaces. They don’t replace human expertise; they amplify it — making insight as simple as asking a question.

As organizations prepare for the next phase of AI adoption, those that invest in strong metadata foundations, clean data pipelines, and transparent governance will be best positioned to take advantage of this conversational future.

Because the real breakthrough isn’t AI answering our questions. It’s that, for the first time, anyone can ask them.

Bringing It All Together

At Diskover Data, we see natural language interfaces as part of a larger shift toward intelligent, metadata-driven data ecosystems. The future of AI-ready data isn’t just about scale or speed — it’s about context, accessibility, and trust. By enriching unstructured data with business meaning and enabling seamless orchestration across systems, Diskover helps organizations create the strong data foundation these AI interactions rely on.

Learn more about how Diskover is helping enterprises structure the unstructured and prepare their data for the conversational era and check out the Diskover Data AI Assistant below.

The post The Power of AI Natural Language Interfaces: Transforming How We Work with Data appeared first on .

The CloudSoda AI Assistant: What We Learned Building Our First AI-Powered Solution

Reece Poirier — Thu, 11 Sep 2025 13:00:00 +0000

September 11, 2025

The CloudSoda AI Assistant: What We Learned Building Our First AI-Powered Solution

In his latest Medium post, our VP of Product, Olivier Rivard, shares the behind-the-scenes story of how we built the CloudSoda AI Data Assistant—our first AI-powered solution.

From MCP to Prototype

The idea grew from Olivier’s experiments with Anthropic’s Model Context Protocol (MCP), which allows AI assistants to connect directly to real-world systems. If MCP could make his own workflows more powerful, why not bring the same approach to customers? That question set the stage for building an assistant that could make unstructured data instantly actionable.

The NAB Show in Las Vegas became the forcing function: a deadline to design, build, and demo a working prototype in front of thousands of industry professionals.

Strategic Choices

A core decision was which AI models to support. Customers generally fell into two camps:

Security-focused organizations restricted to tools like Microsoft Copilot that keep data contained.
Enterprise-standardized organizations built around a single LLM like ChatGPT.

By supporting both, and by leveraging MCP, the assistant could meet customers where they were while maintaining flexibility. To ensure accuracy, we also exposed a structured set of functions through our APIs, preventing AI “hallucinations” and ensuring trustworthy results.

What We Enabled

The initial release gave the assistant reliable capabilities, including:

Analyzing storage usage, capacity, and costs
Simulating cost savings for different storage scenarios
Identifying duplicate files and their impact
Auditing file age and access history
Listing connected storage systems for planning

Why It Stood Out

At NAB, the AI Assistant didn’t just capture engineers’ attention—it drew in executives. Leaders saw they could finally ask questions in plain language (“How much cold data do we have in PowerScale?”) and get instant, actionable answers, without waiting on reports or pulling in staff.

Why It Stood Out

The NAB prototype was just the first step. In his next post, Olivier will share how we took the concept further—using AI itself to help design, implement, and review a new connector for the Diskover platform.

Read Olivier’s full blog on Medium.

Read the blog on Medium →

The post The CloudSoda AI Assistant: What We Learned Building Our First AI-Powered Solution appeared first on .

How MCP Turned AI Experiments into Real Product Ideas

Reece Poirier — Thu, 28 Aug 2025 13:26:02 +0000

August 28, 2025

How MCP Turned AI Experiments into Real Product Ideas

At Diskover Data, we’re always exploring new ways to make unstructured data more accessible, more useful, and more actionable. For our VP of Product, Olivier Rivard, that journey recently took an exciting turn with the release of Anthropic’s Model Context Protocol (MCP).

Olivier has launched a new four-part blog series on Medium where he shares his personal experience using AI not just as a productivity tool, but as a true development partner. His goal: to explore how far AI can go in helping a product leader move from ideas to working prototypes.

The Spark: Why AI + MCP Changed Everything

Olivier had already been using tools like ChatGPT and Claude for writing and research. But with MCP, AI assistants could connect to real systems, pull live data, and take meaningful action.

That shift turned his experiments into product ideas. Instead of just brainstorming with AI, he was able to:

Build a prototype connector that generated storage reports, usage trends, and file searches.
Ask Claude to create tasks, analyze storage patterns, and even forecast growth.
See the potential for customers to ask natural language questions like “How much cold data do we have in PowerScale?” and get actionable answers instantly.

Why This Matters

For IT teams managing billions of files across on-prem and cloud systems, the ability to combine AI with MCP opens new possibilities:

Automating weekly health checks and cleanup tasks
Surfacing insights across multiple systems
Lowering the barrier for non-technical users to get meaningful answers
Extending product capabilities without waiting for UI updates

As Olivier puts it: the real power of AI isn’t just answering questions — it’s in acting on the answers.

Read the Full Post

This is just the beginning of the series. In Part 1, Olivier shares:

Why his work in unstructured data led him to explore MCP
The first prototypes he built entirely with AI
How these experiments are shaping the future of Diskover Data

Read Olivier’s full blog on Medium.

Stay tuned for Part 2, where he’ll dive into the origins of the CloudSoda AI Assistant, the technical decisions behind it, and lessons learned along the way.

Read the blog on Medium →

The post How MCP Turned AI Experiments into Real Product Ideas appeared first on .

Diskover Now Available on Oracle Cloud Marketplace

Reece Poirier — Tue, 26 Aug 2025 14:00:00 +0000

August 26, 2025

Diskover Now Available on Oracle Cloud Marketplace

Diskover has officially joined the Oracle Cloud Marketplace, giving OCI customers an easier path to unlocking the value of their unstructured data—with full integration into OCI services and the ability to purchase using Oracle Universal Credits.

Unstructured data—video files, design assets, genomics, sensor logs, legal documents—isn’t just growing. It’s multiplying. And for many Oracle Cloud Infrastructure (OCI) customers, it’s become one of the biggest hurdles to operational efficiency, innovation, and AI readiness.

Today, we’re excited to share that Diskover is now available on the Oracle Cloud Marketplace, making it easier for OCI customers to gain clarity, control, and insight across their unstructured data landscape.

A Modern Approach to a Growing Challenge

Organizations across industries rely on OCI for scalable infrastructure, but as workloads shift to the cloud, data fragmentation often follows. Unstructured files accumulate in object stores and block volumes, but without the right tools in place, they remain hard to find, manage, or make use of.

Diskover helps bridge that gap—offering unified visibility and automation across OCI and hybrid environments. Our platform integrates with key OCI services like Object Storage, Block Storage, and OpenSearch to index billions of files, enrich them with context, and orchestrate data movement securely and at scale.

Whether it’s surfacing cold data for archive, finding redundant content before cloud tiering, or preparing curated datasets for AI workflows, Diskover gives teams a faster, more informed way to manage data across its lifecycle.

Fueling What Comes Next

As more businesses adopt AI-first strategies, unstructured data becomes foundational—not just for storage optimization, but for training large language models, building recommendation engines, or enabling real-time decision-making.

But before it can power anything, it needs to be structured, enriched, and accessible. That’s where Diskover fits in.

From supporting media pipelines and protecting IP to streamlining genomics research and improving operational planning, Diskover helps customers make unstructured data usable—at petabyte scale, and without adding complexity to their workflows.

Simplified Access for OCI Customers

By offering Diskover on the Oracle Cloud Marketplace, we’re streamlining how OCI users can evaluate, deploy, and purchase our platform. Existing Oracle customers can even use OCI Universal Credits toward the purchase, removing procurement hurdles and accelerating time to value.

Explore the listing in the OCI Marketplace, or connect with our team to learn how Diskover can help you structure your unstructured data—for AI, for business, and for what’s next.

Talk with us →

The post Diskover Now Available on Oracle Cloud Marketplace appeared first on .

Boost Value of AI-Ready Unstructured Data with Diskover + Snowflake

Reece Poirier — Thu, 14 Aug 2025 13:47:30 +0000

August 14, 2025

Boost Value of AI-Ready Unstructured Data with Diskover + Snowflake

Introduction

AI is dominating boardroom agendas—but there’s a disconnect. While most enterprises are eager to build or adopt AI solutions, very few are truly ready. The technology is advancing fast, but the data foundations needed to support it are still shaky, especially when it comes to unstructured data.

At the heart of this challenge is a question that many teams can’t yet answer:
What data should we feed our AI models—and how do we get it there?

Diskover acts as a super connector—bridging fragmented, unstructured data environments and delivering enriched, curated datasets directly into Snowflake. Together, they make it easier to identify the right data, prepare it with context, and move it into AI pipelines with confidence.

Unstructured Data: The Hidden Core of AI

It’s time to rethink what unstructured data actually is—and why it matters so much for AI.

Unstructured data includes everything from PDFs, chat logs, and video files to microscope images, process logs, and design files. It doesn’t conform to rows, columns, or fixed schemas, making it difficult to manage, search, or use effectively. Yet it often holds a company’s most valuable intellectual property and insights.

In life sciences, it’s lab notebook scans, clinical observations, and instrument output
In manufacturing, it’s test logs, wafer inspection images, and simulation output
In media, it’s raw footage, VFX assets, and archived creative files

These are the raw materials that can fuel AI—but only if they’re discoverable, enriched with context, and accessible through the platforms where AI actually happens.

The Reality: Why AI Enablement is So Hard

For most teams trying to build AI capabilities today, unstructured data is the biggest blind spot. The challenges are systemic and familiar:

1. “We don’t know what we have.”

Most enterprises are sitting on petabytes of unstructured data scattered across on-prem servers, legacy NAS systems, cloud buckets, and forgotten archives. Without a unified view, teams can’t even begin to assess what’s usable.

2. “We can’t find what matters.”

Even when data is located, it’s often miscategorized, duplicated, or buried in silos with no metadata. There’s no easy way to identify what’s high value versus what’s outdated, redundant, or irrelevant.

3. “We don’t know how to get it into Snowflake.”

Moving unstructured data into Snowflake or another analytics platform typically involves brittle custom scripts, manual tagging, or time-consuming staging processes that delay AI initiatives and create technical debt.

4. “We’re not confident in the quality of what we’re ingesting.”

Bad data in means bad data out. Feeding AI pipelines with irrelevant, unlabeled, or incomplete data can skew results, bias models, and waste compute. Without visibility and enrichment, even large datasets deliver limited value.

These challenges aren’t about theoretical strategy—they’re about real day-to-day blockers that keep teams from operationalizing AI.

The Fix: Diskover + Snowflake Make It Seamless

The partnership between Diskover and Snowflake is designed to address these challenges head-on—by closing the gap between raw data and AI-ready assets.

Step 1: Diskover Finds and Enriches Your Data

Diskover acts as a global indexer across your entire storage landscape—on-prem, cloud, legacy, and hybrid. It scans file systems at scale, builds a searchable metadata catalog, and adds business context through intelligent tagging and enrichment.

No more guessing what’s out there. Diskover shows you:

What data you have
Where it lives
How it’s being used (or not)
Which datasets are most valuable for AI

It also lets you slice data by owner, project, age, access frequency, file type, and cost—giving you the clarity needed to make smart decisions fast.

Step 2: Curate and Filter What Matters

Instead of dumping everything into Snowflake and hoping for the best, Diskover lets teams curate only what matters. You can define policies to identify:

Cold data older than X months
Duplicate files that can be ignored
Files related to specific workflows, projects, or business units
Specific types of files (e.g., large video files, microscopy images, system logs)

This curation step helps teams feed AI models with high-value, high-integrity data—while avoiding the clutter that can dilute performance and drive up costs.

Step 3: Move to Snowflake via Openflow

Once data is curated and tagged, Diskover uses Snowflake Openflow to seamlessly move selected datasets into Snowflake’s environment—ready to be queried, joined, or fed into AI pipelines like Snowflake Cortex.

There’s no re-architecting or brittle ETL. And because the data arrives enriched with metadata and business context, it’s far more meaningful from the moment it lands.

“We’re seeing more customers adopt an AI-first data strategy, which depends on having access to all your data. Enterprises can’t unlock the full value of AI without knowing what unstructured data they have and how to use it. Our partnership with Diskover, in combination with Snowflake Openflow, makes that possible, acting as a super-connector to exabyte-scale unstructured data.”

— Harsha Kapre, Director, Snowflake Ventures

Industry Examples: From Theory to Impact

Let’s look at how this plays out in practice.

Life Sciences: Accelerating Drug Discovery

A research team working on new drug therapies needs to analyze years of experimental results, raw microscope images, and clinical notes. These datasets are stored in various formats across multiple environments.

With Diskover:

They discover, tag, and organize the relevant assets—by experiment, date, or researcher
Filter out irrelevant or low-quality data
Seamlessly move curated datasets into Snowflake for model training, correlation analysis, and discovery

Semiconductor Manufacturing: Optimizing Chip Yield

A chipmaker wants to build predictive models to reduce manufacturing defects. The raw material? Process logs, inspection images, and test results—all unstructured and scattered across facilities.

With Diskover:

They pinpoint the files that correlate to known defect patterns
Enrich with metadata like product ID, location, and equipment
Feed this clean, labeled data into Snowflake to power defect prediction models

Media & Entertainment: Powering Recommendation Engines

A media company wants to personalize its platform by analyzing viewer preferences across thousands of hours of archived content. But the video files are unlabeled and spread across aging storage.

With Diskover:

They discover and tag content by show, actor, theme, or scene
Remove outdated or duplicated assets
Move curated metadata and assets into Snowflake for real-time recommendation and content repackaging

Getting Started: From Insight to Ingestion

Diskover will soon be available as a Connected App in the Snowflake Marketplace, allowing customers to:

Purchase with Snowflake credits
Integrate directly via Openflow
Move from discovery to ingestion in just a few clicks

Want to see how it works with your data? Get in touch for a live demo or early access.

“Proud to support Diskover Data as they help companies uncover their most valuable data across legacy systems with a unified, searchable view.

Together with Snowflake’s easy, trusted, and connected platform, we’re helping customers seamlessly ingest critical data and build a strong, AI-ready foundation.”

— Sridhar Ramaswamy, CEO, Snowflake

Final Thoughts: The Future of AI Starts with the Right Data

AI isn’t just about algorithms or models. It’s about data. And not just any data—the right data.

With Diskover and Snowflake, teams now have a seamless way to:

Discover what they have
Curate what matters
Enrich it with business context
Ingest it directly into AI pipelines
And do it all without rebuilding infrastructure or creating technical debt

Diskover and Snowflake give you a direct path from fragmented storage to AI-ready data—enriched, curated, and delivered where you need it.

Ready to harness the value of your unstructured data? Learn how Diskover can help you find, enrich, and deliver your data to power breakthrough AI use cases.

Talk with us →

The post Boost Value of AI-Ready Unstructured Data with Diskover + Snowflake appeared first on .