Data Engineering Podcast


This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

Rewind 10 seconds
1X
Skip 30 seconds ahead
0:00/0:00

Listen in your favorite app:



More options

Here are shows you might like

See show recommendations
AI Engineering Podcast
Tobias Macey
The Python Podcast.__init__
Tobias Macey

506 Episodes

Beyond Prompts: Practical Paths to Self‑Improving AI - E505

Summary  In this episode Raj Shukla, CTO of SymphonyAI, explores what it really takes to build self‑improving AI systems that work in production. Raj unpacks how agentic systems interact with real-world environments, the feedback loops that enable continuous learning, and why intelligent memory layers often provide the most practical middle ground…

Summary  In this episode Raj Shukla, CTO of SymphonyAI, explores what it really takes to build…

16 March 2026 | 01:01:50


Orion at Gravity: Trustworthy AI Analysts for the Enterprise - E504

Summary  In this episode of the Data Engineering Podcast, Lucas Thelosen and Drew Gilson, co-founders of Gravity, discuss their vision for agentic analytics in the enterprise, enabled by semantic layers and broader context engineering. They share their journey from Looker and Google to building Orion, an AI analyst that combines data semantics with…

Summary  In this episode of the Data Engineering Podcast, Lucas Thelosen and Drew Gilson,…

08 March 2026 | 01:05:01


From Models to Momentum: Uniting Architects and Engineers with ER/Studio - E503

Summary  In this episode of the Data Engineering Podcast, Jamie Knowles (Product Director) and Ryan Hirsch (Product Marketing Manager) discuss the importance of enterprise data modeling with ER/Studio. They highlight how clear, shared semantic models are a foundational discipline for modern data engineering, preventing semantic drift, speeding up…

Summary  In this episode of the Data Engineering Podcast, Jamie Knowles (Product Director) and Ryan…

02 March 2026 | 00:45:02


From Data Models to Mind Models: Designing AI Memory at Scale - E502

Summary  In this episode of the Data Engineering Podcast, Vasilije "Vas" Markovich, founder of Cognee, discusses building agentic memory, a crucial aspect of artificial intelligence that enables systems to learn, adapt, and retain knowledge over time. He explains the concept of agentic memory, highlighting the importance of distinguishing between…

Summary  In this episode of the Data Engineering Podcast, Vasilije "Vas" Markovich, founder of…

22 February 2026 | 00:57:47


Prompt Management, Tracing, and Evals: The New Table Stakes for GenAI Ops - E501

Summary  In this episode of the Data Engineering Podcast, Aman Agarwal, creator of OpenLit, discusses the operational groundwork required to run LLM-powered applications reliably and cost-effectively. He highlights common blind spots that teams face, including opaque model behavior, runaway token costs, and brittle prompt management, and explains…

Summary  In this episode of the Data Engineering Podcast, Aman Agarwal, creator of OpenLit,…

15 February 2026 | 00:50:43


From Legacy to AI-Ready: How MongoDB AMP Accelerates Modernization - E500

Summary In this episode, Shilpa Kolhar, SVP of Product and Engineering at MongoDB, discusses using MongoDB as a unified foundation for AI-driven and agentic applications. She explains how the Application Modernization Platform (AMP) accelerates the transition from legacy relational systems to a document-first architecture, driven by the need for…

Summary In this episode, Shilpa Kolhar, SVP of Product and Engineering at MongoDB, discusses using…

08 February 2026 | 00:46:45


Branches, Diffs, and SQL: How Dolt Powers Agentic Workflows - E499

Summary  In this episode Tim Sehn, founder and CEO of DoltHub, talks about Dolt - the world’s first version‑controlled SQL database - and why Git‑style semantics belong at the heart of data systems and AI workflows. Tim explains how Dolt combines a MySQL/Postgres‑compatible interface with a novel storage engine built on a “Prollytree” to enable…

Summary  In this episode Tim Sehn, founder and CEO of DoltHub, talks about Dolt - the world’s first…

01 February 2026 | 00:56:53


Logical First, Physical Second: A Pragmatic Path to Trusted Data - E498

Summary  In this episode of the Data Engineering Podcast Jamie Knowles, Product Director for ER/Studio, talks about data architecture and its importance in driving business meaning. He discusses how data architecture should start with business meaning, not just physical schemas, and explores the pitfalls of jumping straight to physical designs.…

Summary  In this episode of the Data Engineering Podcast Jamie Knowles, Product Director for…

25 January 2026 | 00:40:50


Your Data, Your Lake: How Observe Uses Iceberg and Streaming ETL for Observability - E497

Summary  In this episode Jacob Leverich, cofounder and CTO of Observe, talks about applying lakehouse architectures to observability workloads. Jacob discusses Observe’s decision to leverage cloud-native warehousing and open table formats for scale and cost efficiency. He digs into the core pain points teams face with fragmented tools, soaring…

Summary  In this episode Jacob Leverich, cofounder and CTO of Observe, talks about applying…

18 January 2026 | 01:12:21


Semantic Operators Meet Dataframes: Building Context for Agents with FENIC - E496

Summary  In this episode Kostas Pardalis talks about Fenic - an open-source, PySpark-inspired dataframe engine designed to bring LLM-powered semantics into reliable data engineering workflows. Kostas shares why today’s data infrastructure assumptions (BI-first, expert-operated, CPU-bound) fall short for AI-era tasks that are increasingly inference-…

Summary  In this episode Kostas Pardalis talks about Fenic - an open-source, PySpark-inspired…

12 January 2026 | 00:56:42