This directory contains the complete planning, specifications, and test definitions for the Dataiku Reusable Workflows System. The system enables enterprise data science teams to shift from project-based delivery to component-based composition.
Status: Planning Phase (TDD) Target: AI Agent Implementation
Enterprise data science teams deliver projects, not capabilities. Each new project rebuilds similar pipelines from scratch. Code gets reused, but workflows don't. The goal is to enable:
- Workflow Reuse - Reuse configured, wired, tested pipelines (not just code snippets)
- Reference Without Cloning - Access the same pipeline rather than managing copies
- Multi-Model Orchestration - Stitch 6-7 models into orchestrated solutions
- Granular Reusability - Consume blocks in parts (just data prep, not full model)
- Hierarchical Components - Compose from sensor → equipment → process → plant → business
- Extension Without Modification - Extend blocks without affecting other consumers
A Block is a Flow Zone containing datasets, recipes, and optionally models, with defined input/output boundaries. Blocks are the unit of reuse.
A Solution Block is a special block type that orchestrates multiple model blocks in sequence or via dependency resolution.
The BLOCKS_REGISTRY is a dedicated Dataiku project that serves as the central catalog of all published blocks. Individual projects reference blocks from the registry.
Blocks support two extension patterns:
- Python Inheritance - Class inheritance in project libraries
- Recipe Override - Replace a recipe with a custom implementation having the same I/O
┌─────────────────────────────────────────────────────────────────┐
│ User Layer │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │ Claude Code │ │ Codex │ │ Gemini CLI │ │
│ └──────┬───────┘ └──────┬───────┘ └──────────┬───────────┘ │
│ └─────────────────┼─────────────────────┘ │
│ ▼ │
├─────────────────────────────────────────────────────────────────┤
│ Agent Layer │
│ ┌─────────────────────┐ ┌─────────────────────────────┐ │
│ │ Discovery Agent │ │ Executing Agent │ │
│ │ (Crawl → Catalog) │ │ (Intent → Plan → Apply) │ │
│ └──────────┬──────────┘ └──────────────┬──────────────┘ │
│ │ │ │
│ ▼ ▼ │
├─────────────────────────────────────────────────────────────────┤
│ Catalog Layer │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ BLOCKS_REGISTRY Project │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐ │ │
│ │ │ Wiki │ │ Library │ │ Bundles │ │ │
│ │ │ (Human) │ │ (JSON) │ │ (Versions) │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────────┘ │ │
│ └─────────────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│ Engine Layer │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ IaC Engine (Extended for Blocks) │ │
│ │ Config → Validate → Plan → Apply │ │
│ └─────────────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│ Dataiku Layer │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Dataiku DSS API │ │
│ │ Projects | Datasets | Recipes | Zones | Wiki | Models │ │
│ └─────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
| Document | Description |
|---|---|
| 01-overview.md | System architecture, layers, component interactions |
| 02-block-model.md | Block definition, types, lifecycle, validation |
| 03-registry.md | BLOCKS_REGISTRY project design, operations |
| 04-inheritance.md | Extension patterns (class, recipe, composition) |
| 05-solution-blocks.md | Multi-model orchestration, dependency resolution |
| Component | Files | Status |
|---|---|---|
| discovery-agent/ | specification.md, api-design.md, test-cases.md | Complete |
| executing-agent/ | specification.md, api-design.md, test-cases.md | Complete |
| iac-extension/ | specification.md, api-design.md, test-cases.md | Complete |
| Schema | Purpose |
|---|---|
| block-reference.schema.json | Block references in IaC config |
| block-manifest.schema.json | Block manifests in registry |
| catalog-index.schema.json | Catalog index file format |
| iac-config.schema.json | IaC config with block support |
| Example | Description | Complexity |
|---|---|---|
| 01-simple-block.yaml | Single block with basic I/O | Beginner |
| 02-chained-blocks.yaml | Multiple blocks in sequence | Beginner |
| 03-block-with-recipe-override.yaml | Custom recipe replacement | Intermediate |
| 04-block-with-class-extension.yaml | Class injection | Intermediate |
| 05-multi-instance-blocks.yaml | Same block, multiple instances | Intermediate |
| 06-solution-block.yaml | Multi-block solution | Advanced |
| Resource | Description |
|---|---|
| PLANNING_SUMMARY.md | Initial planning session summary |
| tests/ | Test case definitions (TDD) |
Goal: Crawl projects and build the block catalog
- Zone-based block detection
- Input/output boundary identification
- Wiki catalog writer (hierarchical)
- Library JSON index writer
- Schema extraction
- Merge logic for manual edits
Goal: Add block support to existing IaC engine
- BlockConfig model (zone-based)
- SolutionConfig model (multi-model)
- Configurable hierarchy
- Extension/inheritance support
- Block validation rules
- Integration with existing IaC
Goal: Match intent to blocks and generate plans
- Catalog reader (Wiki + Library)
- Block matching logic
- Wiring/composition engine
- Config generator
- Intent parsing
Goal: Deploy composed blocks as projects
- Zone instantiation
- Recipe inheritance/override
- Cross-block wiring
- Single project deployment
- Bundle snapshot creation
| Decision | Choice | Rationale |
|---|---|---|
| Block granularity | Flow Zone | Natural boundary with defined I/O |
| Registry model | Centralized BLOCKS_REGISTRY | Single source of truth, projects reference it |
| Extension model | Both Python inheritance + Recipe override | Maximum flexibility |
| Hierarchy | Organization-configurable | Different orgs have different taxonomies |
| Catalog storage | Wiki (human) + Library (JSON) | Human-editable + machine-parseable |
| Versioning | Semantic + Bundle snapshots | Industry standard + immutable releases |
| Solution sequencing | Explicit + Dependency resolution | Support both known and dynamic ordering |
This system builds on the existing IaC implementation (Waves 1-3). Key integration points:
- Config Parsing - Extend parser for
blocks:andsolutions:sections - Validation - Add block reference validation, hierarchy validation
- State Management - Track instantiated blocks in state
- Plan Generation - Include block operations in plan output
- Apply Execution - Leverage Wave 4 apply for block deployment
See iac-extension/ for detailed integration specs.
Important: These documents are written for AI agents to implement. Each component specification includes:
- Clear interfaces - Input/output contracts
- Step-by-step logic - Explicit algorithms
- Test cases - TDD-style test definitions
- Error handling - Expected failure modes
- Dependencies - Required imports and setup
Agents should:
- Read the specification completely before implementing
- Write tests first (TDD)
- Follow the interfaces exactly as specified
- Handle all documented error cases
- Not add features beyond the specification
| Category | Items | Status |
|---|---|---|
| Architecture Docs | 5 documents | Complete |
| Discovery Agent | spec, api-design, test-cases | Complete |
| Executing Agent | spec, api-design, test-cases | Complete |
| IaC Extension | spec, api-design, test-cases | Complete |
| JSON Schemas | 4 schemas | Complete |
| Example Configs | 6 examples | Complete |
Total Documentation: 26 files covering all aspects of the system.
- Document Version: 1.1.0
- Last Updated: 2025-11-28
- Status: Planning Complete - Ready for Implementation