Skip to content

Latest commit

 

History

History
 
 

README.md

Dataiku IaC - Planning & Documentation

Status: Planning Phase Start Date: 2025-11-23 Target: Production-ready Infrastructure-as-Code for Dataiku


Overview

This directory contains comprehensive planning, design, and documentation for Dataiku IaC - a Git-native, declarative infrastructure-as-code layer for managing Dataiku projects and deployments.

Problem Statement

Enterprise DevOps teams are blocked from adopting Dataiku due to:

  1. No declarative infrastructure-as-code - Everything is imperative, click-based
  2. Poor CI/CD integration - Manual processes, no GitOps workflows
  3. State management issues - No HA on Automation/Design nodes, no recovery from failures
  4. Lack of testing framework - Can't validate before deployment
  5. Manual environment management - Connection remapping requires clicks

Solution

Build an external orchestration layer that provides:

  • ✅ Declarative YAML/Python configurations
  • ✅ Git-based version control and rollback
  • ✅ Plan/Apply workflow (Terraform-style)
  • ✅ State management external to Dataiku
  • ✅ Testing framework for pipelines
  • ✅ Automatic environment remapping
  • ✅ CI/CD integration templates
  • ✅ Govern approval workflows

Target Users (Priority Order)

  1. DevOps/Platform Engineers - Need GitOps, IaC, CI/CD integration
  2. Data Scientists - Need simpler abstractions, less clicking
  3. Data Engineers - Benefit from testing, validation, automation

Documentation Structure

dataiku-iac-planning/
├── README.md                          # This file
│
├── architecture/                      # System architecture
│   ├── 01-overview.md                # High-level architecture
│   ├── 02-state-management.md        # State storage and sync
│   ├── 03-execution-engine.md        # Plan/Apply engine
│   ├── 04-recovery-strategy.md       # Failure recovery
│   └── 05-integration-points.md      # Dataiku/Govern/CI-CD integration
│
├── design/                           # Design specifications
│   ├── 01-config-format.md          # YAML/Python DSL format
│   ├── 02-state-file-format.md      # State file schema
│   ├── 03-validation-rules.md       # Validation logic
│   ├── 04-error-handling.md         # Error messages and recovery
│   └── 05-testing-framework.md      # Testing DSL and execution
│
├── api-specs/                        # API specifications
│   ├── 01-cli-interface.md          # Command-line interface
│   ├── 02-python-api.md             # Python programmatic API
│   ├── 03-config-schema.md          # YAML schema definitions
│   └── 04-state-api.md              # State management API
│
├── roadmap/                          # Implementation planning
│   ├── 01-phases.md                 # Phased rollout plan
│   ├── 02-poc-plan.md               # 4-week POC details
│   ├── 03-dependencies.md           # Technical dependencies
│   └── 04-milestones.md             # Key milestones and metrics
│
├── testing/                          # Testing strategy
│   ├── 01-unit-tests.md            # Unit testing approach
│   ├── 02-integration-tests.md     # Integration testing
│   ├── 03-e2e-tests.md             # End-to-end testing
│   └── 04-user-acceptance.md       # UAT criteria
│
├── examples/                         # Example configurations
│   ├── simple-project/              # Basic project example
│   ├── ml-pipeline/                 # ML workflow example
│   ├── multi-env/                   # Multi-environment setup
│   └── ci-cd-templates/             # GitHub Actions, GitLab CI
│
├── decisions/                        # Architecture Decision Records
│   ├── 001-state-storage.md        # Where to store state
│   ├── 002-config-format.md         # YAML vs Python vs both
│   ├── 003-backward-compat.md       # Compatibility strategy
│   └── template.md                  # ADR template
│
└── research/                         # Research and analysis
    ├── api-coverage-analysis.md     # Current API gaps
    ├── competitor-analysis.md       # dbt, Terraform, etc.
    ├── user-interviews.md           # DevOps team feedback
    └── dataiku-internals.md         # Dataiku architecture notes


Key Design Decisions

1. External State Management

  • Decision: State stored externally (Git + local/S3 state file)
  • Rationale: Dataiku's internal tri-state is not HA-capable
  • Impact: Enables recovery, versioning, team collaboration

2. Hybrid Config Format

  • Decision: Support both YAML and Python DSL
  • Rationale: YAML for simplicity, Python for complex logic
  • Impact: Accessible to all user levels

3. Pragmatic Evolution

  • Decision: Build on top of existing API, don't break it
  • Rationale: Backward compatibility, incremental adoption
  • Impact: Longer timeline but safer migration

4. DevOps-First Design

  • Decision: Target DevOps engineers first
  • Rationale: They're the blockers, solving their problems enables adoption
  • Impact: Focus on CI/CD, GitOps, HA concerns

5. Govern Integration

  • Decision: Native integration with Govern approval workflows
  • Rationale: Synergy between technical and business approvals
  • Impact: Differentiated feature, enterprise-ready

Success Metrics

Technical Metrics

  • ✅ Plan accuracy: 99%+ (plan matches actual apply)
  • ✅ Apply success rate: 95%+ (successful deployments)
  • ✅ Recovery time: <5 min (from failed state to recovered)
  • ✅ State drift detection: Real-time

User Metrics

  • ✅ Time to deploy: 80% reduction vs manual
  • ✅ Deployment errors: 70% reduction
  • ✅ Developer NPS: >50

Business Metrics

  • ✅ Enterprise adoption: 10+ customers in 6 months
  • ✅ Competitive win rate: +20% vs dbt/Terraform users
  • ✅ From nice-to-have to running analytics environments

Current Status

Phase: Planning & Design Next Milestone: Complete documentation (Week 1) After That: 4-week POC development

Completed

  • ✅ Problem validation with stakeholders
  • ✅ Architecture design decisions
  • ✅ User research and requirements
  • ✅ Competitive analysis

In Progress

  • 🔄 Detailed design specifications
  • 🔄 API specification
  • 🔄 Implementation roadmap
  • 🔄 Testing strategy

Upcoming

  • ⏳ POC development (4 weeks)
  • ⏳ Alpha testing with select customers
  • ⏳ Beta release
  • ⏳ Production release

Contributing

This is internal planning documentation. For questions or feedback:

  • Architecture: Review architecture/ folder
  • Design Questions: Check design/ folder
  • Timeline: See roadmap/ folder

Document Conventions

  • Must/Required: Non-negotiable requirement
  • Should: Strong recommendation
  • May/Optional: Nice to have
  • ⚠️ Warning: Important consideration
  • 💡 Tip: Helpful insight

Last Updated: 2025-11-23 Maintained By: Development Team Review Cycle: Weekly during planning, monthly after release