Skip to content

channels concept #939

@tetron

Description

@tetron

Design sketch for a CWL "channel" type.

Motivation

  • Be able to trigger downstream computation when an upstream computation hasn't finished yet, but has produced partial results
  • To decouple dependencies that cross workflow boundaries (such as passing a channel to a subworkflow and providing a value later) e.g. https://cwl.discourse.group/t/deferred-input-for-scattered-subworkflow/484
  • To provide a model where external events / triggers not known at workflow start can be incorporated into workflow execution
  • To facilitate translation from other workflow languages that are channel based, e.g. NextFlow

Model

A channel is an asynchronous queue of items. It has two logical variables

  • contents: empty or non-empty
  • state: open or closed

The channel has three operations

  • get: get the next item in contents (last in first out order)
  • put: add an item to contents
  • close: close the channel for further writing
empty non-empty
open put: allowed; get: blocks until content available or channel is closed put: allowed; get: next item in queue
closed put: not allowed; get: processing ends put: not allowed; get: next item in queue

Channels are never explicitly created, closed, written to or read within the workflow. This model exists only to inform the behavior of workflow steps that produce or consume channel objects.

type: channel
items: (channel content type, e.g. string, File)

Passing a channenl to a workflow step

When a channel is passed to a workflow step, it checks the type of the parameter on the run step.

  • If the parameter is a regular type (string, File, etc) it blocks until all the contents are available, and produces an array.
  • If the parameter is a channel type, it passes a reference to the channel

Use in scattering

A channel object can be produced by a scatter operation:

stepOutputType: one of "normal" or "channel"

When "stepOutputType: channel", then a scattered workflow step immediately produces channels for all its outputs. Each individual scatter step puts its result into the channel as it becomes available. When all scatter steps complete, the channel is closed.

A workflow step can "scatter" over a channel.

This will start a scatter instance for each item in the channel, until the channel is closed and empty.

Scattering over multiple channels follow similar dot product / cross product behavior as for arrays.

Channels could also be produced by the proposed iterative loop feature.

Single item channels

type: channel
items: (channel content type, e.g. string, File)
singleItem: true

If singleItem is true, the channel will only ever have at most one item. The first "put" closes the channel.

A non-scattered step can produce channels with stepOutputType: "channel"

On a non-scattered workflow step, this produces results immediately as channels with singleItem: true

When a channel with singleItem: true is passed to a workflow step, it checks the type of the parameter on the run step.

  • If the parameter is a regular type (string, File, etc) it blocks until the item is available, and passes that
  • If the parameter is a channel type, it passes a reference to the channel

TBD

What happens if a channel is passed to two downstream workflow steps. Are the items copied to both, do they race to get the items, or is this an error?

The deferred input case described here https://cwl.discourse.group/t/deferred-input-for-scattered-subworkflow/484 favors the "items copied" pattern.

Proposed model for items copied pattern: when calling get() you provide a "get id" for that reader instance. Each unique id starts at position 0 and reads to close.

This also more explicitly models the channel as an array whose final size is not known when it is created, but facilitates the "automatic behavior" conversion of a channel to an array when connected to an array type input.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions