Skip to content

Extend Airflow operator by implementing KubernetesExecutor #2

@adwk67

Description

@adwk67

As a user I want to have the option of running my airflow DAGs with the KubernetesExecutor, so that I have greater control over resource configuration (some settings can be defined per job) and usage (each job runs in its own pod which is created on-demand).

Implementation

  • the airflow controller must define a pod template according to the specification details see here
  • this template must be mounted via e.g. PVC at the location defined by AIRFLOW__KUBERNETES__POD_TEMPLATE_FILE (see configuration)
  • if the airflow resource specifies KubernetesExecutor, then the scheduler recognises this and the KubernetesExecutor requests a worker pod from the Kubernetes API according the the template definition

Background/Context

Currently the airflow-operator implements the CeleryExecutor (Local- and SequentialExecutors are also supported but are not scalable) whereby webserver and scheduler pods interact with multiple (celery-)worker pods: celery reads job data from the external database and queues jobs via an external Redis instance. There are other executors available:

  • KubernetesExecutor
    • each job is spun up in its own pod, which is then destroyed
    • no queue component is needed
    • accessing logs is more complicated
    • not sure if complex jobs can be distributed over multiple workers (as is the case with Celery)

The full list is here: https://airflow.apache.org/docs/apache-airflow/stable/executor/index.html#executor-types

See also #313

Metadata

Metadata

Assignees

Labels

customer-requestrelease-noteDenotes a PR that will be considered when it comes time to generate release notes.release-note/action-requiredDenotes a PR that introduces potentially breaking changes that require user action.release/23.11.0

Type

No type

Projects

Status

Done

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions