claude-guides

Dataiku Python API - Claude Code Navigation Guides

Purpose: These guides are designed to help Claude Code sessions navigate and use the Dataiku Python API effectively. They document common workflows, patterns, gotchas, and best practices.

Audience: Claude Code AI sessions (not human developers)

Guide Structure

CRITICAL: Start Here

00-project-planning-guide.md ⭐ READ THIS FIRST!
- Why planning matters for Claude Code (prevents getting lost!)
- Creating detailed project plans BEFORE coding
- Naming conventions (UPPERCASE for Snowflake compatibility)
- Phase-by-phase implementation workflow
- Visual flow planning and dependencies
- Progress tracking and checkpoints

Getting Started

01-prerequisites-and-setup.md
- Installation and environment setup
- API key generation and management
- Connection verification
- Common setup issues
02-authentication-and-connection.md
- Scope hierarchy (CRITICAL CONCEPT)
- Authentication methods
- Connection patterns
- Permission handling

Core Operations

03-project-operations.md
- Creating and configuring projects
- Project metadata and variables
- Project contents and flow
- Export/import patterns
04-dataset-operations.md
- Dataset CRUD operations
- Schema management
- Reading and writing data
- Building and partitioning
- Dataset metadata
05-recipe-workflows.md
- Recipe types and creation
- Running and monitoring recipes
- Schema updates
- Recipe dependencies
06-scenario-automation.md
- Creating and configuring scenarios
- Scenario steps and triggers
- Running and monitoring
- Notifications and reporters

Advanced Topics

07-ml-workflows.md
- ML task operations
- Model training and evaluation
- Saved models and versioning
- Model deployment
08-common-gotchas.md
- Critical concepts to remember
- Common errors and solutions
- Best practices
- Debugging checklist

Quick Reference

99-quick-reference.md
- Cheat sheet for common operations
- Code snippets
- Quick patterns
- Essential reminders

Quick Start

First time or starting a new project? Read in order: 0. Project Planning Guide ⭐ (00-project-planning-guide.md) - START HERE!

Prerequisites and Setup (01-prerequisites-and-setup.md)
Authentication and Connection (02-authentication-and-connection.md)
Project Operations (03-project-operations.md)
Dataset Operations (04-dataset-operations.md)

Need specific help? Jump to:

Planning → 00-project-planning-guide.md ⭐
Recipes → 05-recipe-workflows.md
Automation → 06-scenario-automation.md
ML → 07-ml-workflows.md
Troubleshooting → 08-common-gotchas.md
Quick lookup → 99-quick-reference.md

Critical Concepts

1. Scope Hierarchy (MUST UNDERSTAND)

DSSClient (Instance Level)
    ↓
DSSProject (Project Level)
    ↓
DSSDataset / DSSRecipe / DSSScenario (Item Level)

You must go through each level! Cannot skip.

2. Settings Must Be Saved

# ❌ WRONG
settings = dataset.get_settings()
settings.settings['description'] = "New"
# Changes lost!

# ✓ CORRECT
settings = dataset.get_settings()
settings.settings['description'] = "New"
settings.save()  # Critical!

3. Async Operations

Many operations (builds, scenarios, training) are asynchronous. You must wait for completion.

4. Use UPPERCASE Naming (Best Practice)

Strongly recommended: Use UPPERCASE for project keys, dataset names, and column names (especially with Snowflake):

MY_PROJECT ✓ (recommended)
RAW_CUSTOMERS, CLEAN_ORDERS ✓ (recommended for Snowflake)
Snowflake requires uppercase table/column names
Prevents case-sensitivity issues

See 00-project-planning-guide.md for complete naming conventions.

5. Variables Are Strings

variables = project.get_variables()
batch_size = int(variables["batch_size"])  # Convert!

Common Patterns

Basic ETL

from dataikuapi import DSSClient
import os

# Connect
client = DSSClient(
    os.getenv('DATAIKU_HOST'),
    os.getenv('DATAIKU_API_KEY')
)

# Get project
project = client.get_project("MY_PROJECT")

# Build source
source = project.get_dataset("source_data")
source.build(wait=True)

# Run transformation
recipe = project.get_recipe("transform_data")
recipe.run(wait=True)

# Verify output
output = project.get_dataset("final_output")
metadata = output.get_metadata()
print(f"✓ Output has {metadata.get('recordCount', 0)} rows")

Automation Pipeline

# Run scenario
scenario = project.get_scenario("daily_refresh")
scenario_run = scenario.run_and_wait()

if scenario_run.get_outcome() == 'SUCCESS':
    print("✓ Pipeline succeeded")
else:
    print("❌ Pipeline failed")
    # Get logs, send alerts, etc.

Environment Setup

# Required
export DATAIKU_HOST="https://dss.yourcompany.com"
export DATAIKU_API_KEY="your-api-key-here"

# Multi-environment setup
export DATAIKU_DEV_HOST="https://dev-dss.company.com"
export DATAIKU_DEV_API_KEY="dev-key"
export DATAIKU_PROD_HOST="https://prod-dss.company.com"
export DATAIKU_PROD_API_KEY="prod-key"

Resources

Official API Docs: https://developer.dataiku.com/latest/api-reference/python/
Main Dataiku Docs: https://doc.dataiku.com/
GitHub: https://github.com/dataiku/dataiku-api-client-python
PyPI: https://pypi.org/project/dataiku-api-client/

Contributing to These Guides

These guides are maintained for Claude Code sessions. When updating:

Keep examples practical and tested
Include gotchas and common mistakes
Show both ❌ wrong and ✓ correct patterns
Focus on what Claude Code needs to know
Keep code snippets copy-pasteable

Version Info

Guide Version: 1.0
API Version: 14.1.3+
Last Updated: 2025-11-21
Python: 3.7+

Quick Gotchas Reminder

⚠️ Scope hierarchy - Must go through project
⚠️ Save settings - Always call .save()
⚠️ Use UPPERCASE naming - Especially for Snowflake: MY_PROJECT, RAW_CUSTOMERS
⚠️ Variables are strings - Convert types!
⚠️ Async operations - Wait for completion
⚠️ Schema updates - Call compute_schema_updates().apply()
⚠️ Scenario runs - Two-step process

Happy automating! 🚀

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Dataiku Python API - Claude Code Navigation Guides

Guide Structure

CRITICAL: Start Here

Getting Started

Core Operations

Advanced Topics

Quick Reference

Quick Start

Critical Concepts

1. Scope Hierarchy (MUST UNDERSTAND)

2. Settings Must Be Saved

3. Async Operations

4. Use UPPERCASE Naming (Best Practice)

5. Variables Are Strings

Common Patterns

Basic ETL

Automation Pipeline

Environment Setup

Resources

Contributing to These Guides

Version Info

Quick Gotchas Reminder

Name		Name	Last commit message	Last commit date
parent directory ..
00-project-planning-guide.md		00-project-planning-guide.md
01-prerequisites-and-setup.md		01-prerequisites-and-setup.md
02-authentication-and-connection.md		02-authentication-and-connection.md
03-project-operations.md		03-project-operations.md
04-dataset-operations.md		04-dataset-operations.md
05-recipe-workflows.md		05-recipe-workflows.md
06-scenario-automation.md		06-scenario-automation.md
07-ml-workflows.md		07-ml-workflows.md
08-common-gotchas.md		08-common-gotchas.md
99-quick-reference.md		99-quick-reference.md
README.md		README.md

FilesExpand file tree

claude-guides

Directory actions

More options

Directory actions

More options

Latest commit

History

claude-guides

Folders and files

parent directory

README.md

Dataiku Python API - Claude Code Navigation Guides

Guide Structure

CRITICAL: Start Here

Getting Started

Core Operations

Advanced Topics

Quick Reference

Quick Start

Critical Concepts

1. Scope Hierarchy (MUST UNDERSTAND)

2. Settings Must Be Saved

3. Async Operations

4. Use UPPERCASE Naming (Best Practice)

5. Variables Are Strings

Common Patterns

Basic ETL

Automation Pipeline

Environment Setup

Resources

Contributing to These Guides

Version Info

Quick Gotchas Reminder