https://diskoverdata.com/ Structure the unstructured. Thu, 05 Mar 2026 00:46:23 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.4 https://diskoverdata.com/wp-content/uploads/2024/08/diskover_favicon-1-150x150.png https://diskoverdata.com/ 32 32 166954265 Building the Diskover MCP Connector With AI: From Idea to Shipping Product https://diskoverdata.com/about/blog/building-the-diskover-mcp-connector-with-ai/?utm_source=rss&utm_medium=rss&utm_campaign=building-the-diskover-mcp-connector-with-ai https://diskoverdata.com/about/blog/building-the-diskover-mcp-connector-with-ai/#respond Thu, 05 Mar 2026 00:46:22 +0000 https://diskoverdata.com/?p=27120 AI can write code - but can it build a real product? See how Diskover’s VP of Product used AI agents to define requirements, generate Jira stories, implement features, and ship an MCP connector that’s headed to customers.

The post Building the Diskover MCP Connector With AI: From Idea to Shipping Product appeared first on .

]]>

Building the Diskover MCP Connector With AI: From Idea to Shipping Product

Not a demo.
Not a proof of concept.
A real product that will ship to customers.

This summary highlights the key ideas and why they matter for how modern products get built.

Olivier’s goal wasn’t to “use AI for coding.” It was to see whether AI could operate as a cross-functional product team spanning product management, engineering execution, and quality review.

The result: a fully AI-assisted workflow that went from product requirements to deployed code.

Instead of starting with a blank document, Olivier defined how the AI should behave – effectively turning it into a Senior Product Management Assistant.

With access to Jira via MCP, the AI could:

  • Understand Diskover’s existing templates and standards
  • Create properly scoped Epics and stories
  • Generate acceptance criteria
  • Populate the development board automatically

This wasn’t documentation for documentation’s sake. It produced a clean, executable roadmap that engineering could immediately act on.

Rather than trying to do everything, the first version of the MCP connector focused on a small set of high-value Diskover capabilities, including:

For each feature, the AI:

  1. Asked clarifying questions
  2. Drafted a complete Jira story with acceptance criteria
  3. Created the issue under the correct Epic

By the end of this step, the early roadmap for the MCP connector lived entirely inside Jira – generated and structured by AI, but aligned with Diskover’s real development process.

AI didn’t stop at planning. It implemented the features too.

Claude Code handled development tasks by pulling Jira issues directly and writing code. ChatGPT Codex was used as a reviewer, flagging bugs, edge cases, and optimization opportunities. Olivier then validated, tested, and guided fixes – acting as the final quality gate.

The human role shifted from “doing the work” to directing and validating the work.

One of the clearest takeaways from this experiment is that AI-driven development doesn’t eliminate the need for product or technical leadership – it amplifies it.

Success depended on:

  • Clear, unambiguous product requirements
  • Strong intuition about user workflows
  • The ability to review and validate code written by others (human or AI)

In other words, experience still matters. The tools changed, the responsibility didn’t.

Read the full story

This summary only scratches the surface. Olivier’s full Medium post goes deeper into:

  • The exact AI workflows used
  • Lessons learned along the way
  • Why this approach is reshaping how Diskover thinks about product development

If you’re curious about what product development looks like in an AI-first world, it’s worth your time.

The post Building the Diskover MCP Connector With AI: From Idea to Shipping Product appeared first on .

]]>
https://diskoverdata.com/about/blog/building-the-diskover-mcp-connector-with-ai/feed/ 0 27120
AI Is Driving a Memory and Storage Crunch and Efficiency Will Decide Who Keeps Moving https://diskoverdata.com/about/blog/ai-is-driving-a-memory-and-storage-crunch-and-efficiency-will-decide-who-keeps-moving/?utm_source=rss&utm_medium=rss&utm_campaign=ai-is-driving-a-memory-and-storage-crunch-and-efficiency-will-decide-who-keeps-moving Fri, 30 Jan 2026 13:47:29 +0000 https://diskoverdata.com/?p=26969 AI is driving unprecedented demand for memory and storage, creating a growing infrastructure crunch that threatens both AI initiatives and everyday business operations. As supply tightens and costs rise, organizations must focus on using memory and storage more efficiently to protect ROI, delay costly upgrades, and keep critical workloads running.

The post AI Is Driving a Memory and Storage Crunch and Efficiency Will Decide Who Keeps Moving appeared first on .

]]>

AI Is Driving a Memory and Storage Shortage and Efficiency Will Decide Who Keeps Moving

The result is an emerging memory and storage shortage that threatens not only AI initiatives, but core business operations as well.

The technology industry has seen supply shortages before. What makes this moment different is where demand is coming from and how it’s reshaping supply.

Even as manufacturers expand fabrication capacity, production is being redirected toward AI workloads rather than evenly expanded across markets. This is not a temporary imbalance, it represents a structural shift in how memory and storage resources are allocated in an AI-driven world.

Memory and storage are often discussed together, but they play different roles and both are affected by the same upstream constraints.

Memory (DRAM and high-bandwidth memory) is used to actively process data. It powers caches, metadata services, analytics, AI pipelines, and model execution. The more data you keep active or frequently accessed, the more memory your systems require.

Storage (flash and disk) holds data at rest, but modern storage systems rely heavily on memory to operate efficiently. Enterprise storage arrays use large amounts of DRAM for caching, indexing, metadata handling, and performance optimization.

Both memory and storage components are manufactured from the same foundational resource: silicon wafers. As more of that silicon is allocated to AI-optimized memory, fewer wafers are available for general-purpose DRAM and flash used across enterprise infrastructure. That zero-sum dynamic means pressure on memory inevitably cascades into storage systems, cloud platforms, and everyday IT operations.

This shift isn’t confined to hyperscalers or research labs. As AI infrastructure consumes more silicon capacity, the effects ripple across the entire technology ecosystem.

While hardware constraints get the headlines, unstructured data sprawl quietly magnifies the problem.

Across most organizations:

  • The majority of stored data is unstructured
  • Much of it is duplicated, rarely accessed, or no longer relevant
  • Yet it still consumes premium storage and memory through indexing, caching, and analytics

As AI workloads expand, inefficient data management becomes a direct memory problem. Feeding models and pipelines with poorly curated data increases memory pressure and storage consumption without improving outcomes.

Organizations can’t control global supply chains, but they can control how efficiently they use what they already have.

You can’t manage memory or storage efficiently if you don’t know what data exists, where it lives, or how it’s used. Visibility is the foundation for reducing both capacity waste and memory overhead.

Not all data needs to stay active. Reducing the amount of data that must remain hot lowers storage costs and the memory required to process, cache, and analyze it.

Duplicate files, abandoned projects, and outdated datasets quietly consume capacity while increasing memory usage during scans, indexing, and AI ingestion.

Manual cleanup doesn’t scale. Policy-driven automation ensures data moves, ages, or retires based on business rules, reducing long-term pressure on both memory and storage systems.

AI models don’t need more data, they need better data. Every unnecessary file introduced into an AI pipeline increases memory requirements during training, inference, and retrieval, amplifying the very shortages organizations are trying to avoid.

The memory and storage crunch driven by AI isn’t going away. Even as new manufacturing capacity comes online, demand continues to grow faster and priorities have permanently shifted.

Organizations that treat this as a temporary pricing issue will remain reactive. Those that treat it as a data management challenge will stay resilient and protect their return on infrastructure investments.

In an environment of rising memory and storage costs, ROI is no longer just about buying the right hardware. It’s about extending the value of what you already own, delaying expensive upgrades, and ensuring high-performance resources are reserved for workloads that actually drive the business forward.

Diskover helps organizations take that data-centric approach. By delivering global visibility into unstructured data, rich metadata insights, and automated lifecycle controls, Diskover enables teams to:

  • Reduce waste across existing storage, avoiding unnecessary capacity expansion
  • Lower memory pressure by minimizing duplicated, inactive, or poorly curated data
  • Preserve high-performance infrastructure for critical AI and business workloads
  • Delay hardware refreshes and cloud spend, improving ROI on current investments
  • Support AI initiatives without destabilizing day-to-day operations

In an era where memory and storage are becoming strategic constraints, efficiency isn’t just about cost control. It’s about maximizing ROI, protecting infrastructure budgets, and ensuring the business can continue to operate, innovate, and grow, even as AI reshapes the economics of IT.

Ready to structure the unstructured?

The post AI Is Driving a Memory and Storage Crunch and Efficiency Will Decide Who Keeps Moving appeared first on .

]]>
26969
Is Your Data AI-Ready? Most Enterprises Aren’t – Here’s Why https://diskoverdata.com/about/blog/is-your-data-ai-ready-most-enterprises-arent-heres-why/?utm_source=rss&utm_medium=rss&utm_campaign=is-your-data-ai-ready-most-enterprises-arent-heres-why Thu, 15 Jan 2026 18:37:31 +0000 https://diskoverdata.com/?p=26809 Most enterprises are racing toward AI, but the majority of their data simply isn’t ready. Siloed storage, missing metadata, manual workflows, and years of cluttered unstructured files slow AI initiatives before they start. This post explores why AI-readiness is so hard - and how Diskover helps organizations finally structure the unstructured.

The post Is Your Data AI-Ready? Most Enterprises Aren’t – Here’s Why appeared first on .

]]>

Is Your Data AI-Ready? Most Enterprises Aren’t – Here’s Why

Unstructured data has become the foundation of AI, yet it’s also the hardest to wrangle. Files, images, videos, logs, documents, design assets, sensor output – these assets sit scattered across systems, clouds, and archives. Without a clear strategy for discovering, organizing, and preparing them, even the most ambitious AI initiatives stall before they start.

Below are the six most common reasons enterprises struggle with AI readiness and what organizations can begin doing today to close the gap.

Most enterprises store data everywhere: NAS, object storage, cloud buckets, on-prem archives, remote offices, legacy systems, user drives – the list expands every year. These silos made sense when teams worked independently. But AI depends on unified visibility and consistent access, which these fragmented systems cannot provide.

When no one can answer basic questions like “Where does this dataset live?” or “How many versions of this asset exist?” – AI pipelines grind to a halt.

Where to start:

  • Inventory all storage systems and repositories
  • Document which teams rely on which platforms
  • Identify redundant systems and legacy environments that no longer support modern workflows
  • Encourage movement toward shared, standardized data access patterns

Organizations are sitting on thousands to billions of files, but lack insight into what’s active, critical, duplicated, sensitive, or junk. And without that visibility, AI efforts begin with guesswork rather than strategy.

This leads to overspending on storage, slow data retrieval, and an inability to prioritize the datasets most likely to fuel AI value.

Where to start:

  • Implement tagging (manual or scripted) based on file attributes
  • Remove obvious redundancies, temp files, duplicate content
  • Work with finance to quantify storage cost by tier or repository
  • Build a simple classification model (Active / Archive / Delete) to begin segmenting datasets

Inactive files often sit on the most expensive storage tiers, sometimes for years. These assets slow down scans, backups, migrations, and AI data preparation. Worse, they clog infrastructure that should be optimized for high-value, frequently accessed data.

AI workloads require fast, curated, context-rich datasets, not a mountain of stale archives.

Where to start:

  • Flag files not accessed in the last 6–12 month
  • Move inactive data to lower-cost storage tiers
  • Review old logs, outdated backups, duplicates, and abandonware
  • Work with business units to align retention with actual value

File movement. Folder cleanup. Tagging. Classification. Data syncing. Lifecycle management.

When datasets hit petabyte scale, manual processes collapse. And every hour spent manually preparing files is an hour not spent building or training AI models.

To meet AI’s velocity, enterprises need automated workflows, policy-driven actions, and continuous metadata enrichment.

Where to start:

  • Automate repetitive tasks like cleanup, tagging, and archival
  • Centralize ownership for automation initiatives in a focused ops team
  • Evaluate platforms for API-driven or rules-driven automation
  • Pilot small workflow automations to prove value and build momentum

Data scientists are hired to innovate, but many spend 60–70% of their time hunting for files, deciphering naming conventions, massaging inconsistent formats, or filtering low-value data from massive file collections.

This not only delays AI projects; it reduces accuracy, slows iteration, and frustrates the teams you hired to accelerate progress.

Where to start:

  • Centralize documentation for datasets
  • Enforce naming standards across the organization
  • Assign data stewards to high-impact domains
  • Build a searchable internal catalog for known datasets

AI isn’t something you “bolt on” to existing systems. It relies on a data architecture capable of:

  • high-throughput ingestion
  • fast indexing
  • metadata enrichment
  • flexible data mobility
  • consistent governance
  • scalable curation

Without these foundations, organizations may have terabytes or petabytes of unstructured data, but none of it is ready for intelligent use.

Where to start:

  • Map friction points in your current AI workflows
  • Define your ideal end-to-end data pipeline
  • Allocate resources for data readiness (not just AI tools)
  • Align IT, engineering, and AI teams around a shared data strategy

While these steps help organizations begin improving AI readiness, truly unlocking unstructured data at scale requires indexing, visibility, context, and orchestration, all working together.

✔ Indexing and discovering all unstructured data across storage, clouds, and archives

✔ Identifying high-value, redundant, stale, or orphaned files with precision

Diskover helps enterprises stop guessing and start strategically preparing their unstructured data so AI teams can move faster and build better models using datasets that are accurate, complete, and context-rich.

If your enterprise is ready to finally get control of unstructured data and make your data truly AI-ready, Diskover can help you get there.

Ready to structure the unstructured?

The post Is Your Data AI-Ready? Most Enterprises Aren’t – Here’s Why appeared first on .

]]>
26809
Unlocking the Power of Unstructured Data with Diskover on the RiVA Platform https://diskoverdata.com/about/news/unlocking-the-power-of-unstructured-data-with-diskover-on-the-riva-platform/?utm_source=rss&utm_medium=rss&utm_campaign=unlocking-the-power-of-unstructured-data-with-diskover-on-the-riva-platform Mon, 05 Jan 2026 16:14:25 +0000 https://diskoverdata.com/?p=26547 Enterprises are generating unstructured data at massive scale, but traditional tools can’t deliver the visibility or automation needed to make that data AI-ready. In this summary of our joint NetApp + Diskover post, we highlight how ONTAP’s secure foundation and Diskover’s global indexing, enrichment, and policy-driven workflows help organizations unlock the true value of their data—at speed and at scale.

The post Unlocking the Power of Unstructured Data with Diskover on the RiVA Platform appeared first on .

]]>

Unlocking the Power of Unstructured Data with Diskover on the RiVA Platform

RiVA is well known for delivering high-performance private cloud environments purpose-built for geoscience workflows. By adding Diskover’s global metadata indexing, search, and workflow automation, RiVA evolves into a unified data intelligence platform.

With Diskover embedded into RiVA, organizations gain:

  • A single, searchable view of all unstructured data – across on-prem, the GeoComputing Cloud, commercial cloud storage, and legacy archives.
  • Lightning-fast discovery with rich metadata, enabling engineers, geologists, and data managers to find exactly what they need in seconds.
  • Automated curation and cleanup to prepare datasets for AI, analytics, and compliance workflows.

The result is a streamlined environment where petabytes of unstructured files become accessible, meaningful, and ready for action.

Our client had divested a permit and needed to find all related data to handover and then remove from disk. They provided a list of prospects, seismic surveys and aliases which Diskover could use in a single search to very quickly scan the entire RiVA filesystem for matching references and provide file/folder listings and sizes. This task was completed in less than an hour. It would have taken weeks and needed extensive historical knowledge without Diskover.

Tim Ballinger, GeoComputing Group LLC

    Across the energy sector, Diskover + RiVA accelerates mission-critical workflows that have traditionally depended on slow manual searches or tribal knowledge. Examples include:

    1. Preparing Datasets for Generative AI & Analytics
      • Curate AI-ready corpuses by filtering for the most relevant seismic volumes, well files, geologic reports, or project tags. Automate movement to GPU-backed environments in RiVA.
    2. Divestiture & M&A Data Readiness
      • Quickly locate all assets tied to a basin, block, permit, or survey – including hidden references or legacy folder structures. Export, package, and hand off clean datasets in hours instead of weeks.
    3. Seismic Data Quality, Cleanup & Reconciliation
      • Identify duplicates, outdated SEG-Y versions, or abandoned project folders. Automatically tag data by survey, operator, or vintage to maintain clean, trusted libraries.
    4. Archive Modernization & Legacy Data Recovery
      • Surfacing forgotten or inaccessible archives – tape migrations, old shares, orphaned user directories – becomes fast and predictable using Diskover’s metadata-driven indexing.
    5. Compliance, Governance & Data Lifecycle Management
      • Track where regulated datasets live, how they’re being used, and when they should be archived or deleted. Automatically enforce retention policies across environments.

    These use cases deliver immediate operational value, particularly for teams under pressure to move faster, reduce storage waste, and prepare for AI-driven exploration workflows.

    GeoComputing’s latest generation of RiVA brings AI-ready GPUs, containerized deployments, and seamless integration with modern geoscience applications. Combined with Diskover’s scalable indexing, deep metadata enrichment, and automated workflows, E&P organizations gain a platform built for the next decade of digital subsurface innovation.

    Together, GeoComputing Group and Diskover Data are setting a new standard for energy data management – empowering teams to unlock more value from their data, accelerate decision-making, and support sustainable growth.

    The post Unlocking the Power of Unstructured Data with Diskover on the RiVA Platform appeared first on .

    ]]>
    26547
    From Whiteboard to Workflow: How Diskover Built Its Openflow Connector for Snowflake https://diskoverdata.com/about/blog/streaming-metadata-to-snowflake-with-openflow-anddiskover/?utm_source=rss&utm_medium=rss&utm_campaign=streaming-metadata-to-snowflake-with-openflow-anddiskover Mon, 15 Dec 2025 22:27:51 +0000 https://diskoverdata.com/?p=26184 Enterprises are generating unstructured data at massive scale, but traditional tools can’t deliver the visibility or automation needed to make that data AI-ready. In this summary of our joint NetApp + Diskover post, we highlight how ONTAP’s secure foundation and Diskover’s global indexing, enrichment, and policy-driven workflows help organizations unlock the true value of their data—at speed and at scale.

    The post From Whiteboard to Workflow: How Diskover Built Its Openflow Connector for Snowflake appeared first on .

    ]]>

    From Whiteboard to Workflow: How Diskover Built Its Openflow Connector for Snowflake

    Summary of our recent article published on the Snowflake Medium blog

    Traditional ingestion methods rely on custom-built JDBC connectors – flexible, but difficult to scale, maintain, and deploy. Diskover needed a different approach. Snowflake Openflow provided a low-code, processor-driven framework that handled much of the orchestration behind the scenes, while Kafka acted as the durable, future-proof transport layer.

    This allowed Diskover to:

    • Scale ingestion without custom code, using Openflow’s built-in orchestration
    • Standardize on Kafka, enabling future expansion to other warehouses with Kafka consumers
    • Accelerate onboarding, even for customers without deep engineering teams

    Diskover’s connector processes two major data flows:

    1. Quota records: storage allocation and usage over time
    2. File metadata: billions of filenames, paths, timestamps, sizes, and owners indexed by Diskover

    Openflow organizes these flows into parallel branches on the canvas, separates metadata-only records from those requiring additional handling, and streams the results into Snowflake using Snowpipe Streaming.

    diskover + openflow overview

    As Byron built the first working version, a few themes emerged:

    1. Low-code still requires engineering discipline.
      • Each processor needs careful configuration – thread counts, schemas, partitioning strategies. The visual workflow hides orchestration, but not design responsibility.
    2. Error messages can be…cryptic.
      • Some Openflow errors were vague or misleading. Debugging often required digging into SQL roots or using AI tools to interpret internal messages.
    3. Scaling is about design, not just settings.
      • Kafka partitioning proved essential. Too few partitions bottleneck the pipeline; too many add overhead. Openflow helps surface bottlenecks, but thoughtful upfront design matters most.
    4. Automate everything possible.
      • To avoid manual copy-paste errors across dozens of processors, Byron built a Go-based code generator to produce consistent JSON, SQL, and YAML configs, a key step toward repeatability.
    • Start small and iterate. Build a functional pipeline before stressing it with scale.
    • Leverage the Apache NiFi community. Openflow concepts closely follow NiFi, and existing documentation is invaluable.
    • Automate configs early. Avoid manual parameter editing at all costs.
    • Expect tuning and refinement. Openflow accelerates development, but production pipelines still require thoughtful engineering.

    The connector is fully functional today, with beta deployments beginning soon. These real-world environments will answer open questions around autoscaling, throughput, and handling more complex metadata or blob-level extraction.

    Long term, the architecture positions Diskover to extend similar pipelines to other cloud warehouses, because Kafka remains at the core, the pipeline can evolve without major redesign.

    The post From Whiteboard to Workflow: How Diskover Built Its Openflow Connector for Snowflake appeared first on .

    ]]>
    26184
    The Power of AI Natural Language Interfaces: Transforming How We Work with Data https://diskoverdata.com/about/blog/the-power-of-ai-natural-language-interfaces-transforming-how-we-work-with-data/?utm_source=rss&utm_medium=rss&utm_campaign=the-power-of-ai-natural-language-interfaces-transforming-how-we-work-with-data Fri, 07 Nov 2025 18:04:37 +0000 https://diskoverdata.com/?p=25786 AI natural language interfaces are redefining how we interact with data. Instead of navigating complex dashboards or waiting on reports, users can simply ask questions and get instant insights. As unstructured data grows and AI fluency spreads, natural language is becoming the new interface for discovery—bridging the gap between human understanding and machine intelligence.

    The post The Power of AI Natural Language Interfaces: Transforming How We Work with Data appeared first on .

    ]]>

    The Power of AI Natural Language Interfaces: Transforming How We Work with Data

    Now, a new wave of AI-powered natural language interfaces (NLIs) is redefining what it means to be “data-driven.” These conversational systems allow anyone to ask questions and get answers from across vast data ecosystems — no SQL, no scripting, no waiting for IT.

    Most analytics and storage tools were built for specialists. Even the friendliest dashboard still assumes you know which metrics to look for and where to find them. Natural language interfaces change that dynamic.

    Instead of navigating reports or building queries, users can simply ask:

    • “Where are we spending the most on cold storage?”
    • “Which datasets haven’t been accessed in six months?”
    • “Summarize data growth trends over the past quarter.”

    Behind the scenes, AI interprets intent, retrieves relevant metadata, and responds conversationally — turning what was once a technical task into a human one.

    Two trends are converging to make natural language interfaces essential rather than optional.

    First, data complexity is growing faster than our ability to manage it. Unstructured data now represents an estimated 80–90% of what most organizations store. It’s scattered across on-prem systems, clouds, and SaaS platforms, each with its own permissions, formats, and costs.

    Second, AI fluency is spreading across the enterprise. Large language models have made natural language the new operating system of work. From summarizing documents to generating code, employees are learning to ask better questions — and expect faster answers.

    The logical next step is to bring that same simplicity to how we explore and manage data itself.

    Natural language doesn’t just make data more accessible — it changes how organizations think. When anyone can ask a question and get an answer instantly, curiosity replaces gatekeeping. Teams experiment more freely, explore ideas faster, and spot inefficiencies sooner.

    These capabilities illustrate how AI-driven natural language tools are blurring the line between insight and action. By connecting intuitive commands to powerful metadata intelligence, organizations can empower every user to manage and optimize data with the same ease as asking a question.

    Despite the excitement, natural language interfaces are not magic. Organizations will need to confront several realities:

    • Context matters. AI must understand not just language, but data lineage, ownership, and relevance.
    • Accuracy and transparency remain critical — users need to trust where answers come from.
    • Governance and access control still apply; conversational doesn’t mean unsecured.
    • Metadata quality will determine the quality of results. Without rich, structured metadata, AI can’t deliver meaningful answers.

    In other words, natural language isn’t a shortcut around data management — it’s a reason to do it better.

    Natural language interfaces represent a shift as significant as the move from command lines to graphical interfaces. They don’t replace human expertise; they amplify it — making insight as simple as asking a question.

    As organizations prepare for the next phase of AI adoption, those that invest in strong metadata foundations, clean data pipelines, and transparent governance will be best positioned to take advantage of this conversational future.

    Because the real breakthrough isn’t AI answering our questions. It’s that, for the first time, anyone can ask them.

    At Diskover Data, we see natural language interfaces as part of a larger shift toward intelligent, metadata-driven data ecosystems. The future of AI-ready data isn’t just about scale or speed — it’s about context, accessibility, and trust. By enriching unstructured data with business meaning and enabling seamless orchestration across systems, Diskover helps organizations create the strong data foundation these AI interactions rely on.

    The post The Power of AI Natural Language Interfaces: Transforming How We Work with Data appeared first on .

    ]]>
    25786
    The CloudSoda AI Assistant: What We Learned Building Our First AI-Powered Solution https://diskoverdata.com/about/blog/the-cloudsoda-ai-assistant-what-we-learned-building-our-first-ai-powered-solution/?utm_source=rss&utm_medium=rss&utm_campaign=the-cloudsoda-ai-assistant-what-we-learned-building-our-first-ai-powered-solution Thu, 11 Sep 2025 13:00:00 +0000 https://diskoverdata.com/?p=22985 The CloudSoda AI Data Assistant began as an experiment with Anthropic’s Model Context Protocol (MCP) and quickly became a working prototype unveiled at the NAB Show. Built to analyze storage, simulate cost savings, and surface duplicates, it gave both engineers and executives instant, actionable answers in plain language.

    The post The CloudSoda AI Assistant: What We Learned Building Our First AI-Powered Solution appeared first on .

    ]]>

    The CloudSoda AI Assistant: What We Learned Building Our First AI-Powered Solution

    The idea grew from Olivier’s experiments with Anthropic’s Model Context Protocol (MCP), which allows AI assistants to connect directly to real-world systems. If MCP could make his own workflows more powerful, why not bring the same approach to customers? That question set the stage for building an assistant that could make unstructured data instantly actionable.

    The NAB Show in Las Vegas became the forcing function: a deadline to design, build, and demo a working prototype in front of thousands of industry professionals.

    A core decision was which AI models to support. Customers generally fell into two camps:

    • Security-focused organizations restricted to tools like Microsoft Copilot that keep data contained.
    • Enterprise-standardized organizations built around a single LLM like ChatGPT.

    By supporting both, and by leveraging MCP, the assistant could meet customers where they were while maintaining flexibility. To ensure accuracy, we also exposed a structured set of functions through our APIs, preventing AI “hallucinations” and ensuring trustworthy results.

    The initial release gave the assistant reliable capabilities, including:

    • Analyzing storage usage, capacity, and costs
    • Simulating cost savings for different storage scenarios
    • Identifying duplicate files and their impact
    • Auditing file age and access history
    • Listing connected storage systems for planning

    At NAB, the AI Assistant didn’t just capture engineers’ attention—it drew in executives. Leaders saw they could finally ask questions in plain language (“How much cold data do we have in PowerScale?”) and get instant, actionable answers, without waiting on reports or pulling in staff.

    The NAB prototype was just the first step. In his next post, Olivier will share how we took the concept further—using AI itself to help design, implement, and review a new connector for the Diskover platform.

    The post The CloudSoda AI Assistant: What We Learned Building Our First AI-Powered Solution appeared first on .

    ]]>
    22985
    How MCP Turned AI Experiments into Real Product Ideas https://diskoverdata.com/about/blog/how-mcp-turned-ai-experiments-into-real-product-ideas/?utm_source=rss&utm_medium=rss&utm_campaign=how-mcp-turned-ai-experiments-into-real-product-ideas Thu, 28 Aug 2025 13:26:02 +0000 https://diskoverdata.com/?p=22947 Diskover VP of Product Olivier Rivard shares how Anthropic’s Model Context Protocol (MCP) transformed his AI experiments into real product ideas — and what it means for the future of unstructured data management.

    The post How MCP Turned AI Experiments into Real Product Ideas appeared first on .

    ]]>

    How MCP Turned AI Experiments into Real Product Ideas

    Olivier had already been using tools like ChatGPT and Claude for writing and research. But with MCP, AI assistants could connect to real systems, pull live data, and take meaningful action.

    That shift turned his experiments into product ideas. Instead of just brainstorming with AI, he was able to:

    • Build a prototype connector that generated storage reports, usage trends, and file searches.
    • Ask Claude to create tasks, analyze storage patterns, and even forecast growth.
    • See the potential for customers to ask natural language questions like “How much cold data do we have in PowerScale?” and get actionable answers instantly.

    For IT teams managing billions of files across on-prem and cloud systems, the ability to combine AI with MCP opens new possibilities:

    • Automating weekly health checks and cleanup tasks
    • Surfacing insights across multiple systems
    • Lowering the barrier for non-technical users to get meaningful answers
    • Extending product capabilities without waiting for UI updates

    As Olivier puts it: the real power of AI isn’t just answering questions — it’s in acting on the answers.

    Read the Full Post

    This is just the beginning of the series. In Part 1, Olivier shares:

    • Why his work in unstructured data led him to explore MCP
    • The first prototypes he built entirely with AI
    • How these experiments are shaping the future of Diskover Data

    The post How MCP Turned AI Experiments into Real Product Ideas appeared first on .

    ]]>
    22947
    Diskover Now Available on Oracle Cloud Marketplace https://diskoverdata.com/about/news/diskover-now-available-on-oracle-cloud-marketplace/?utm_source=rss&utm_medium=rss&utm_campaign=diskover-now-available-on-oracle-cloud-marketplace Tue, 26 Aug 2025 14:00:00 +0000 https://diskoverdata.com/?p=22908 Diskover is now available on the Oracle Cloud Marketplace, giving OCI customers a faster, simpler path to unstructured data visibility and AI readiness. With native integration into OCI services like Object Storage and OpenSearch—and the ability to purchase using Oracle Universal Credits—Diskover helps teams structure the unstructured, reduce storage costs, and accelerate innovation.

    The post Diskover Now Available on Oracle Cloud Marketplace appeared first on .

    ]]>

    Diskover Now Available on Oracle Cloud Marketplace

    Diskover has officially joined the Oracle Cloud Marketplace, giving OCI customers an easier path to unlocking the value of their unstructured data—with full integration into OCI services and the ability to purchase using Oracle Universal Credits.

    Unstructured data—video files, design assets, genomics, sensor logs, legal documents—isn’t just growing. It’s multiplying. And for many Oracle Cloud Infrastructure (OCI) customers, it’s become one of the biggest hurdles to operational efficiency, innovation, and AI readiness.

    Today, we’re excited to share that Diskover is now available on the Oracle Cloud Marketplace, making it easier for OCI customers to gain clarity, control, and insight across their unstructured data landscape.

    Organizations across industries rely on OCI for scalable infrastructure, but as workloads shift to the cloud, data fragmentation often follows. Unstructured files accumulate in object stores and block volumes, but without the right tools in place, they remain hard to find, manage, or make use of.

    Diskover helps bridge that gap—offering unified visibility and automation across OCI and hybrid environments. Our platform integrates with key OCI services like Object Storage, Block Storage, and OpenSearch to index billions of files, enrich them with context, and orchestrate data movement securely and at scale.

    Whether it’s surfacing cold data for archive, finding redundant content before cloud tiering, or preparing curated datasets for AI workflows, Diskover gives teams a faster, more informed way to manage data across its lifecycle.

    As more businesses adopt AI-first strategies, unstructured data becomes foundational—not just for storage optimization, but for training large language models, building recommendation engines, or enabling real-time decision-making.

    But before it can power anything, it needs to be structured, enriched, and accessible. That’s where Diskover fits in.

    From supporting media pipelines and protecting IP to streamlining genomics research and improving operational planning, Diskover helps customers make unstructured data usable—at petabyte scale, and without adding complexity to their workflows.

    By offering Diskover on the Oracle Cloud Marketplace, we’re streamlining how OCI users can evaluate, deploy, and purchase our platform. Existing Oracle customers can even use OCI Universal Credits toward the purchase, removing procurement hurdles and accelerating time to value.

    Explore the listing in the OCI Marketplace, or connect with our team to learn how Diskover can help you structure your unstructured data—for AI, for business, and for what’s next.

    The post Diskover Now Available on Oracle Cloud Marketplace appeared first on .

    ]]>
    22908
    Boost Value of AI-Ready Unstructured Data with Diskover + Snowflake https://diskoverdata.com/about/blog/boost-value-of-ai-ready-unstructured-data-with-diskover-snowflake/?utm_source=rss&utm_medium=rss&utm_campaign=boost-value-of-ai-ready-unstructured-data-with-diskover-snowflake Thu, 14 Aug 2025 13:47:30 +0000 https://diskoverdata.com/?p=22846 Unstructured data is one of the biggest barriers to AI adoption. This post explores why it’s so hard to work with—and how Diskover and Snowflake are partnering to make it AI-ready.

    The post Boost Value of AI-Ready Unstructured Data with Diskover + Snowflake appeared first on .

    ]]>

    Boost Value of AI-Ready Unstructured Data with Diskover + Snowflake

    AI is dominating boardroom agendas—but there’s a disconnect. While most enterprises are eager to build or adopt AI solutions, very few are truly ready. The technology is advancing fast, but the data foundations needed to support it are still shaky, especially when it comes to unstructured data.

    At the heart of this challenge is a question that many teams can’t yet answer:
    What data should we feed our AI models—and how do we get it there?

    Diskover acts as a super connector—bridging fragmented, unstructured data environments and delivering enriched, curated datasets directly into Snowflake. Together, they make it easier to identify the right data, prepare it with context, and move it into AI pipelines with confidence.

    It’s time to rethink what unstructured data actually is—and why it matters so much for AI.

    Unstructured data includes everything from PDFs, chat logs, and video files to microscope images, process logs, and design files. It doesn’t conform to rows, columns, or fixed schemas, making it difficult to manage, search, or use effectively. Yet it often holds a company’s most valuable intellectual property and insights.

    • In life sciences, it’s lab notebook scans, clinical observations, and instrument output
    • In manufacturing, it’s test logs, wafer inspection images, and simulation output
    • In media, it’s raw footage, VFX assets, and archived creative files

    These are the raw materials that can fuel AI—but only if they’re discoverable, enriched with context, and accessible through the platforms where AI actually happens.

    For most teams trying to build AI capabilities today, unstructured data is the biggest blind spot. The challenges are systemic and familiar:

    Most enterprises are sitting on petabytes of unstructured data scattered across on-prem servers, legacy NAS systems, cloud buckets, and forgotten archives. Without a unified view, teams can’t even begin to assess what’s usable.

    Even when data is located, it’s often miscategorized, duplicated, or buried in silos with no metadata. There’s no easy way to identify what’s high value versus what’s outdated, redundant, or irrelevant.

    Moving unstructured data into Snowflake or another analytics platform typically involves brittle custom scripts, manual tagging, or time-consuming staging processes that delay AI initiatives and create technical debt.

    Bad data in means bad data out. Feeding AI pipelines with irrelevant, unlabeled, or incomplete data can skew results, bias models, and waste compute. Without visibility and enrichment, even large datasets deliver limited value.

    These challenges aren’t about theoretical strategy—they’re about real day-to-day blockers that keep teams from operationalizing AI.

    The partnership between Diskover and Snowflake is designed to address these challenges head-on—by closing the gap between raw data and AI-ready assets.

    Diskover acts as a global indexer across your entire storage landscape—on-prem, cloud, legacy, and hybrid. It scans file systems at scale, builds a searchable metadata catalog, and adds business context through intelligent tagging and enrichment.

    No more guessing what’s out there. Diskover shows you:

    • What data you have
    • Where it lives
    • How it’s being used (or not)
    • Which datasets are most valuable for AI

    It also lets you slice data by owner, project, age, access frequency, file type, and cost—giving you the clarity needed to make smart decisions fast.

    Instead of dumping everything into Snowflake and hoping for the best, Diskover lets teams curate only what matters. You can define policies to identify:

    • Cold data older than X months
    • Duplicate files that can be ignored
    • Files related to specific workflows, projects, or business units
    • Specific types of files (e.g., large video files, microscopy images, system logs)

    This curation step helps teams feed AI models with high-value, high-integrity data—while avoiding the clutter that can dilute performance and drive up costs.

    Once data is curated and tagged, Diskover uses Snowflake Openflow to seamlessly move selected datasets into Snowflake’s environment—ready to be queried, joined, or fed into AI pipelines like Snowflake Cortex.

    There’s no re-architecting or brittle ETL. And because the data arrives enriched with metadata and business context, it’s far more meaningful from the moment it lands.

     “We’re seeing more customers adopt an AI-first data strategy, which depends on having access to all your data. Enterprises can’t unlock the full value of AI without knowing what unstructured data they have and how to use it. Our partnership with Diskover, in combination with Snowflake Openflow, makes that possible, acting as a super-connector to exabyte-scale unstructured data.” 

    Harsha Kapre, Director, Snowflake Ventures

    Let’s look at how this plays out in practice.

    A research team working on new drug therapies needs to analyze years of experimental results, raw microscope images, and clinical notes. These datasets are stored in various formats across multiple environments.

    With Diskover:

    • They discover, tag, and organize the relevant assets—by experiment, date, or researcher
    • Filter out irrelevant or low-quality data
    • Seamlessly move curated datasets into Snowflake for model training, correlation analysis, and discovery

    A chipmaker wants to build predictive models to reduce manufacturing defects. The raw material? Process logs, inspection images, and test results—all unstructured and scattered across facilities.

    With Diskover:

    • They pinpoint the files that correlate to known defect patterns
    • Enrich with metadata like product ID, location, and equipment
    • Feed this clean, labeled data into Snowflake to power defect prediction models

    A media company wants to personalize its platform by analyzing viewer preferences across thousands of hours of archived content. But the video files are unlabeled and spread across aging storage.

    With Diskover:

    • They discover and tag content by show, actor, theme, or scene
    • Remove outdated or duplicated assets
    • Move curated metadata and assets into Snowflake for real-time recommendation and content repackaging

    Diskover will soon be available as a Connected App in the Snowflake Marketplace, allowing customers to:

    • Purchase with Snowflake credits
    • Integrate directly via Openflow
    • Move from discovery to ingestion in just a few clicks

    Want to see how it works with your data? Get in touch for a live demo or early access.

    “Proud to support Diskover Data as they help companies uncover their most valuable data across legacy systems with a unified, searchable view.

    Together with Snowflake’s easy, trusted, and connected platform, we’re helping customers seamlessly ingest critical data and build a strong, AI-ready foundation.” 

    Sridhar Ramaswamy, CEO, Snowflake

    AI isn’t just about algorithms or models. It’s about data. And not just any data—the right data.

    With Diskover and Snowflake, teams now have a seamless way to:

    • Discover what they have
    • Curate what matters
    • Enrich it with business context
    • Ingest it directly into AI pipelines
    • And do it all without rebuilding infrastructure or creating technical debt

    Diskover and Snowflake give you a direct path from fragmented storage to AI-ready data—enriched, curated, and delivered where you need it.

    Ready to harness the value of your unstructured data? Learn how Diskover can help you find, enrich, and deliver your data to power breakthrough AI use cases.

    The post Boost Value of AI-Ready Unstructured Data with Diskover + Snowflake appeared first on .

    ]]>
    22846