Status: Planning Phase Start Date: 2025-11-23 Target: Production-ready Infrastructure-as-Code for Dataiku
This directory contains comprehensive planning, design, and documentation for Dataiku IaC - a Git-native, declarative infrastructure-as-code layer for managing Dataiku projects and deployments.
Enterprise DevOps teams are blocked from adopting Dataiku due to:
- No declarative infrastructure-as-code - Everything is imperative, click-based
- Poor CI/CD integration - Manual processes, no GitOps workflows
- State management issues - No HA on Automation/Design nodes, no recovery from failures
- Lack of testing framework - Can't validate before deployment
- Manual environment management - Connection remapping requires clicks
Build an external orchestration layer that provides:
- ✅ Declarative YAML/Python configurations
- ✅ Git-based version control and rollback
- ✅ Plan/Apply workflow (Terraform-style)
- ✅ State management external to Dataiku
- ✅ Testing framework for pipelines
- ✅ Automatic environment remapping
- ✅ CI/CD integration templates
- ✅ Govern approval workflows
- DevOps/Platform Engineers - Need GitOps, IaC, CI/CD integration
- Data Scientists - Need simpler abstractions, less clicking
- Data Engineers - Benefit from testing, validation, automation
dataiku-iac-planning/
├── README.md # This file
│
├── architecture/ # System architecture
│ ├── 01-overview.md # High-level architecture
│ ├── 02-state-management.md # State storage and sync
│ ├── 03-execution-engine.md # Plan/Apply engine
│ ├── 04-recovery-strategy.md # Failure recovery
│ └── 05-integration-points.md # Dataiku/Govern/CI-CD integration
│
├── design/ # Design specifications
│ ├── 01-config-format.md # YAML/Python DSL format
│ ├── 02-state-file-format.md # State file schema
│ ├── 03-validation-rules.md # Validation logic
│ ├── 04-error-handling.md # Error messages and recovery
│ └── 05-testing-framework.md # Testing DSL and execution
│
├── api-specs/ # API specifications
│ ├── 01-cli-interface.md # Command-line interface
│ ├── 02-python-api.md # Python programmatic API
│ ├── 03-config-schema.md # YAML schema definitions
│ └── 04-state-api.md # State management API
│
├── roadmap/ # Implementation planning
│ ├── 01-phases.md # Phased rollout plan
│ ├── 02-poc-plan.md # 4-week POC details
│ ├── 03-dependencies.md # Technical dependencies
│ └── 04-milestones.md # Key milestones and metrics
│
├── testing/ # Testing strategy
│ ├── 01-unit-tests.md # Unit testing approach
│ ├── 02-integration-tests.md # Integration testing
│ ├── 03-e2e-tests.md # End-to-end testing
│ └── 04-user-acceptance.md # UAT criteria
│
├── examples/ # Example configurations
│ ├── simple-project/ # Basic project example
│ ├── ml-pipeline/ # ML workflow example
│ ├── multi-env/ # Multi-environment setup
│ └── ci-cd-templates/ # GitHub Actions, GitLab CI
│
├── decisions/ # Architecture Decision Records
│ ├── 001-state-storage.md # Where to store state
│ ├── 002-config-format.md # YAML vs Python vs both
│ ├── 003-backward-compat.md # Compatibility strategy
│ └── template.md # ADR template
│
└── research/ # Research and analysis
├── api-coverage-analysis.md # Current API gaps
├── competitor-analysis.md # dbt, Terraform, etc.
├── user-interviews.md # DevOps team feedback
└── dataiku-internals.md # Dataiku architecture notes
- Decision: State stored externally (Git + local/S3 state file)
- Rationale: Dataiku's internal tri-state is not HA-capable
- Impact: Enables recovery, versioning, team collaboration
- Decision: Support both YAML and Python DSL
- Rationale: YAML for simplicity, Python for complex logic
- Impact: Accessible to all user levels
- Decision: Build on top of existing API, don't break it
- Rationale: Backward compatibility, incremental adoption
- Impact: Longer timeline but safer migration
- Decision: Target DevOps engineers first
- Rationale: They're the blockers, solving their problems enables adoption
- Impact: Focus on CI/CD, GitOps, HA concerns
- Decision: Native integration with Govern approval workflows
- Rationale: Synergy between technical and business approvals
- Impact: Differentiated feature, enterprise-ready
- ✅ Plan accuracy: 99%+ (plan matches actual apply)
- ✅ Apply success rate: 95%+ (successful deployments)
- ✅ Recovery time: <5 min (from failed state to recovered)
- ✅ State drift detection: Real-time
- ✅ Time to deploy: 80% reduction vs manual
- ✅ Deployment errors: 70% reduction
- ✅ Developer NPS: >50
- ✅ Enterprise adoption: 10+ customers in 6 months
- ✅ Competitive win rate: +20% vs dbt/Terraform users
- ✅ From nice-to-have to running analytics environments
Phase: Planning & Design Next Milestone: Complete documentation (Week 1) After That: 4-week POC development
- ✅ Problem validation with stakeholders
- ✅ Architecture design decisions
- ✅ User research and requirements
- ✅ Competitive analysis
- 🔄 Detailed design specifications
- 🔄 API specification
- 🔄 Implementation roadmap
- 🔄 Testing strategy
- ⏳ POC development (4 weeks)
- ⏳ Alpha testing with select customers
- ⏳ Beta release
- ⏳ Production release
This is internal planning documentation. For questions or feedback:
- Architecture: Review architecture/ folder
- Design Questions: Check design/ folder
- Timeline: See roadmap/ folder
- Must/Required: Non-negotiable requirement
- Should: Strong recommendation
- May/Optional: Nice to have
⚠️ Warning: Important consideration- 💡 Tip: Helpful insight
Last Updated: 2025-11-23 Maintained By: Development Team Review Cycle: Weekly during planning, monthly after release