6  Dataflow Model Walkthrough ๐ŸŒŠ

Tableplotโ€™s dataflow system creates templates with automatic dependency resolution and caching. This system is inspired by the work of Jon Anthony (jsa-aerial), author of Hanami, building on Hanamiโ€™s template substitution while adding explicit dependency management.

Hanami drew from lambda calculus ideas. This dataflow model applies similar minimalistic principles to parameterized data transformations.

This walkthrough explores three namespaces: xform for template transformation, dag for dependency management, and cache for memoization.

6.1 Setup

(ns tableplot-book.dataflow-walkthrough
  (:require [scicloj.tableplot.v1.xform :as xform]
            [scicloj.tableplot.v1.dag :as dag]
            [scicloj.tableplot.v1.cache :as cache]
            [aerial.hanami.templates :as ht]
            [aerial.hanami.common :as hc]
            [scicloj.kindly.v4.kind :as kind]
            [tablecloth.api :as tc]))

6.2 Template Transformation with xform

Template transformation is term rewriting - we substitute terms until reaching a fixpoint (a value that cannot be reduced further).

6.2.1 Six Rules

Given a term T and an environment E (a map from keys to values):

Rule 1: Lookup - If T is a key in E, replace with E[T] and recurse.

Rule 2: Identity - If T is a key not in E, itโ€™s a fixpoint.

Rule 3: Function Application - If T is a function, apply it to E and recurse.

Rule 4: Collection Recursion - If T is a map or vector, recurse on elements.

Rule 5: Template Defaults - Templates extend the environment via :aerial.hanami.templates/defaults.

Rule 6: Value - Non-collection, non-key values (numbers, strings, datasets) pass through.

Nil Removal - If a value resolves to nil, it is removed from the result (equivalent to hc/RMV).

6.2.2 Examples

Lookup and Identity

(xform/xform
 {:a :B
  :c :D}
 {:B :C
  :C 10
  :D 20})
{:a 10, :c 20}

Here :a โ†’ :B โ†’ :C โ†’ 10 (stops when 10 isnโ€™t in the map).

Multi-step Substitution

Even strings and numbers can act as keys:

(xform/xform
 {:title :Title}
 {:Title "MyTitle"
  "MyTitle" "Resolved!"})
{:title "Resolved!"}

Fixpoint Detection

(xform/xform
 {:self-ref :X
  :not-found :Y}
 {:X :X ;; maps to itself โ†’ fixpoint
  :Y :Missing})
{:self-ref :X, :not-found :Missing}

:Missing not in map โ†’ fixpoint

Dataset Pass-Through

Datasets pass through unchanged:

(xform/xform
 {:data :Data
  :nested {:deep {:data :Data}}}
 {:Data (tc/dataset {:x [1 2 3] :y [4 5 6]})})

{

:data

_unnamed [3 2]:

:x :y
1 4
2 5
3 6
:nested

{

:deep

{

:data

_unnamed [3 2]:

:x :y
1 4
2 5
3 6

}

}

}

Function Application

Functions receive the entire environment:

(xform/xform
 {:greeting :Greeting}
 {:Name "Alice"
  :Greeting (fn [{:keys [Name]}]
              (str "Hello, " Name "!"))})
{:greeting "Hello, Alice!"}

Template Defaults

Templates extend the environment:

(xform/xform
 {:message :Message
  :aerial.hanami.templates/defaults {:Name "World"
                 :Message (fn [{:keys [Name]}]
                            (str "Hello, " Name "!"))}})
{:message "Hello, World!"}

To override defaults, use key-value pairs:

(xform/xform
 {:message :Message
  :aerial.hanami.templates/defaults {:Name "World"
                 :Message (fn [{:keys [Name]}]
                            (str "Hello, " Name "!"))}}
 :Name "Clojure")
{:message "Hello, Clojure!"}

6.2.3 Nested Defaults

Defaults can appear at any level:

(xform/xform
 {:title :Title
  :section {:heading :Heading
            :aerial.hanami.templates/defaults {:Heading "Default Heading"}}
  :aerial.hanami.templates/defaults {:Title "Default Title"}})
{:title "Default Title", :section {:heading "Default Heading"}}

Inner environments access outer bindings:

(xform/xform
 {:outer {:inner :InnerValue
          :aerial.hanami.templates/defaults {:InnerValue (fn [{:keys [OuterValue]}]
                                       (str "Inner uses: " OuterValue))}}
  :aerial.hanami.templates/defaults {:OuterValue "Parent Value"}})
{:outer {:inner "Inner uses: Parent Value"}}

User substitutions override nested defaults:

(xform/xform
 {:section {:heading :Heading
            :aerial.hanami.templates/defaults {:Heading "Default Heading"}}}
 :Heading "User Heading")
{:section {:heading "User Heading"}}

Nested defaults work with dag/fn-with-deps:

(xform/xform
 {:config {:database {:url :DbUrl
                      :pool-size :PoolSize}
           :aerial.hanami.templates/defaults {:DbHost "localhost"
                          :DbPort 5432
                          :DbName "mydb"
                          :DbUrl (dag/fn-with-deps nil [DbHost DbPort DbName]
                                                   (str "postgresql://" DbHost ":" DbPort "/" DbName))
                          :PoolSize (dag/fn-with-deps nil [Environment]
                                                      (if (= Environment "prod") 50 10))}}
  :aerial.hanami.templates/defaults {:Environment "prod"}})
{:config
 {:database {:url "postgresql://localhost:5432/mydb", :pool-size 50}}}

Recursive Transformation

(xform/xform
 {:user {:name :UserName
         :age :UserAge}
  :items [{:x :X} {:y :Y}]}
 {:UserName "Bob"
  :UserAge 30
  :X 10
  :Y 20})
{:user {:name "Bob", :age 30}, :items [{:x 10} {:y 20}]}

6.2.4 Removing Empty Structure

Empty collections are removed recursively. When a collection becomes empty, its parent may also become empty:

(xform/xform
 {:outer {:middle {:inner []}}})
{}

The hc/RMV value marks parameters as optional:

(xform/xform
 {:title :Title
  :subtitle :Subtitle
  :footer :Footer}
 {:Title "My Chart"
  :Subtitle hc/RMV
  :Footer hc/RMV})
{:title "My Chart"}

nil values are also removed:

(xform/xform
 {:title :Title
  :subtitle :Subtitle
  :footer :Footer}
 {:Title "My Chart"
  :Subtitle nil
  :Footer nil})
{:title "My Chart"}

Hereโ€™s a realistic visualization template with many optional parameters:

(xform/xform
 {:plot {:data :Data
         :layout {:title {:text :Title
                          :font {:size :TitleSize
                                 :family :TitleFamily
                                 :color :TitleColor}
                          :x :TitleX
                          :y :TitleY}
                  :xaxis {:title :XAxisTitle
                          :showgrid :ShowXGrid
                          :gridcolor :XGridColor
                          :gridwidth :XGridWidth}
                  :yaxis {:title :YAxisTitle
                          :showgrid :ShowYGrid
                          :gridcolor :YGridColor
                          :gridwidth :YGridWidth}
                  :annotations :Annotations
                  :shapes :Shapes
                  :images :Images}}
  :aerial.hanami.templates/defaults {:TitleSize hc/RMV
                 :TitleFamily hc/RMV
                 :TitleColor hc/RMV
                 :TitleX hc/RMV
                 :TitleY hc/RMV
                 :XAxisTitle hc/RMV
                 :ShowXGrid hc/RMV
                 :XGridColor hc/RMV
                 :XGridWidth hc/RMV
                 :YAxisTitle hc/RMV
                 :ShowYGrid hc/RMV
                 :YGridColor hc/RMV
                 :YGridWidth hc/RMV
                 :Annotations hc/RMV
                 :Shapes hc/RMV
                 :Images hc/RMV}}
 :Data [{:x [1 2 3] :y [4 5 6] :type "scatter"}]
 :Title "Simple Chart")
{:plot
 {:data [{:x [1 2 3], :y [4 5 6], :type "scatter"}],
  :layout {:title {:text "Simple Chart"}}}}

The 17-parameter template produces lean output with just data and title. Empty maps (:font, :xaxis, :yaxis) are removed.

Tableplot layer functions offer dozens of optional parameters. Default everything to hc/RMV, and users get clean output without specifying irrelevant details.

Functions can return hc/RMV to conditionally include values:

(xform/xform
 {:title :Title
  :subtitle :Subtitle
  :aerial.hanami.templates/defaults {:ShowSubtitle true
                 :Title "My Chart"
                 :Subtitle (fn [{:keys [ShowSubtitle]}]
                             (if ShowSubtitle "A subtitle" hc/RMV))}})
{:title "My Chart", :subtitle "A subtitle"}
(xform/xform
 {:title :Title
  :subtitle :Subtitle
  :aerial.hanami.templates/defaults {:ShowSubtitle true
                 :Title "My Chart"
                 :Subtitle (fn [{:keys [ShowSubtitle]}]
                             (if ShowSubtitle "A subtitle" hc/RMV))}}
 :ShowSubtitle false)
{:title "My Chart"}

6.3 Dependencies with dag

Functions can receive the entire environment, but that makes dependencies invisible. The dag namespace lets functions declare what they need.

Rule 7: Declared Dependencies - resolve dependencies before applying function body.

The fn-with-deps macro declares dependencies:

(xform/xform
 {:b :B
  :c :C
  :aerial.hanami.templates/defaults
  {:A 10
   :B (dag/fn-with-deps "B depends on A"
                        [A]
                        (* A 2))
   :C (dag/fn-with-deps "C depends on B"
                        [B]
                        (+ B 5))}})
{:b 20, :c 25}

The system computes in order: :C needs :B, which needs :A. So A=10, then B=20, then C=25.

When multiple values need the same dependency:

(xform/xform
 {:result :E
  :aerial.hanami.templates/defaults
  {:A 5
   :B (dag/fn-with-deps nil [A] (* A 2))
   :C (dag/fn-with-deps nil [A] (+ A 3))
   :D (dag/fn-with-deps nil [B C] (+ B C))
   :E (dag/fn-with-deps nil [D] (* D 10))}})
{:result 180}

Both B and C need A. The system computes A once, then B and C, then D, then E. The system doesnโ€™t detect cycles - if A needs B and B needs A, youโ€™ll get StackOverflowError.

For reusable functions:

(dag/defn-with-deps area->radius
  "Compute radius from area"
  [Area]
  (Math/sqrt (/ Area Math/PI)))
#'tableplot-book.dataflow-walkthrough/area->radius
(dag/defn-with-deps radius->circumference
  "Compute circumference from radius"
  [Radius]
  (* 2 Math/PI Radius))
#'tableplot-book.dataflow-walkthrough/radius->circumference
(xform/xform
 {:circumference :Circumference
  :aerial.hanami.templates/defaults
  {:Area 100
   :Radius area->radius
   :Circumference radius->circumference}})
{:circumference 35.449077018110316}

Functions carry dependencies in metadata:

(:scicloj.tableplot.v1.dag/dep-ks (meta area->radius))
[:Area]
(:scicloj.tableplot.v1.dag/dep-ks (meta radius->circumference))
[:Radius]
(def complex-fn
  (dag/fn-with-deps "Complex computation"
                    [A B C]
                    (+ A B C)))
(:scicloj.tableplot.v1.dag/dep-ks (meta complex-fn))
[:A :B :C]

6.4 Caching with cache

The cache namespace memoizes dependency resolution. If both B and C need A, A is computed once.

(let [call-count (atom 0)
      expensive-fn (fn [{:keys [X]}]
                     (swap! call-count inc)
                     (println "Computing...")
                     (* X X))]
  (cache/with-clean-cache
    (do
      (dotimes [_ 3]
        (dag/cached-xform-k :Y {:X 5, :Y expensive-fn}))
      @call-count)))
NoteOUT
Computing...
1

The expensive function ran once despite three lookups. with-clean-cache clears the cache before and after. Dependencies are cached automatically:

(def computation-log (atom []))
(reset! computation-log [])
[]
(cache/with-clean-cache
  (do
    (xform/xform
     {:result :C
      :aerial.hanami.templates/defaults
      {:A (fn [_]
            (swap! computation-log conj :computing-A)
            10)
       :B (dag/fn-with-deps nil [A]
                            (swap! computation-log conj :computing-B)
                            (* A 2))
       :C (dag/fn-with-deps nil [A B]
                            (swap! computation-log conj :computing-C)
                            (+ A B))}})
    @computation-log))
[:computing-A :computing-B :computing-C]

:A is needed by both :B and :C, but computed only once. More explicitly:

(def detailed-log (atom []))
(reset! detailed-log [])
[]
(cache/with-clean-cache
  (do
    (xform/xform
     {:final :D
      :aerial.hanami.templates/defaults
      {:A (fn [_]
            (swap! detailed-log conj [:computing :A])
            100)
       :B (dag/fn-with-deps nil [A]
                            (swap! detailed-log conj [:computing :B :with-A A])
                            (* A 2))
       :C (dag/fn-with-deps nil [A]
                            (swap! detailed-log conj [:computing :C :with-A A])
                            (+ A 50))
       :D (dag/fn-with-deps nil [B C]
                            (swap! detailed-log conj [:computing :D :with-B B :with-C C])
                            (+ B C))}})
    @detailed-log))
[[:computing :A]
 [:computing :B :with-A 100]
 [:computing :C :with-A 100]
 [:computing :D :with-B 200 :with-C 150]]

A computed once, B and C both receive 100, D receives 200 and 150.

The cache key includes the key name and substitution map, so different contexts get separate caches:

(cache/with-clean-cache
  [(dag/cached-xform-k :Result {:X 5, :Result (fn [{:keys [X]}] (* X X))})
   (dag/cached-xform-k :Result {:X 7, :Result (fn [{:keys [X]}] (* X X))})])
[25 49]

6.5 Examples

6.5.1 Data Visualization Pipeline

(def viz-pipeline
  {:visualization
   {:data :ProcessedData
    :layout {:title {:text :ChartTitle
                     :font {:size :TitleFontSize
                            :color :TitleColor}}
             :xaxis {:title :XAxisLabel
                     :gridcolor :GridColor}
             :yaxis {:title :YAxisLabel
                     :gridcolor :GridColor}}
    :aerial.hanami.templates/defaults
    {:RawData (tc/dataset {:x [1 2 3 4 5]
                           :y [10 15 13 17 20]})
     :FilteredData (dag/fn-with-deps
                    "Filter data based on threshold"
                    [RawData MinValue]
                    (tc/select-rows RawData #(> (:y %) MinValue)))
     :ProcessedData (dag/fn-with-deps
                     "Transform filtered data"
                     [FilteredData ScaleFactor]
                     (tc/map-columns FilteredData :y [:y] #(* % ScaleFactor)))
     :XAxisLabel (dag/fn-with-deps nil [RawData]
                                   (str "X Values (n=" (tc/row-count RawData) ")"))
     :YAxisLabel (dag/fn-with-deps nil [ScaleFactor]
                                   (str "Y Values (scaled ร—" ScaleFactor ")"))
     :ChartTitle (dag/fn-with-deps nil [MinValue ScaleFactor]
                                   (str "Filtered & Scaled Data (threshold=" MinValue ", scale=" ScaleFactor ")"))
     :TitleFontSize (dag/fn-with-deps nil [Environment]
                                      (if (= Environment "presentation") 24 16))
     :TitleColor (dag/fn-with-deps nil [Environment]
                                   (if (= Environment "dark-mode") "#ffffff" "#000000"))
     :GridColor (dag/fn-with-deps nil [Environment]
                                  (if (= Environment "dark-mode") "#444444" hc/RMV))
     :MinValue 12
     :ScaleFactor 2
     :Environment "standard"}}
   :aerial.hanami.templates/defaults {:TitleFontSize hc/RMV
                  :TitleColor hc/RMV
                  :GridColor hc/RMV}})
^:kind/pprint
(cache/with-clean-cache
  (xform/xform viz-pipeline))
{:visualization {:data _unnamed [4 2]:

| :x | :y |
|---:|---:|
|  2 | 30 |
|  3 | 26 |
|  4 | 34 |
|  5 | 40 |
,
                 :layout
                 {:title
                  {:text
                   "Filtered & Scaled Data (threshold=12, scale=2)",
                   :font {:size 16, :color "#000000"}},
                  :xaxis {:title "X Values (n=5)"},
                  :yaxis {:title "Y Values (scaled ร—2)"}}}}
^:kind/pprint
(cache/with-clean-cache
  (xform/xform viz-pipeline
               :Environment "presentation"
               :MinValue 14))
{:visualization {:data _unnamed [3 2]:

| :x | :y |
|---:|---:|
|  2 | 30 |
|  4 | 34 |
|  5 | 40 |
,
                 :layout
                 {:title
                  {:text
                   "Filtered & Scaled Data (threshold=14, scale=2)",
                   :font {:size 24, :color "#000000"}},
                  :xaxis {:title "X Values (n=5)"},
                  :yaxis {:title "Y Values (scaled ร—2)"}}}}

6.5.2 Configuration System

(def app-config
  {:Environment "production"
   :DatabaseHost "db.example.com"
   :DatabasePort 5432
   :DatabaseName "myapp"
   :DatabaseURL (dag/fn-with-deps
                 "Build database connection string"
                 [DatabaseHost DatabasePort DatabaseName]
                 (format "postgresql://%s:%d/%s"
                         DatabaseHost DatabasePort DatabaseName))
   :LogLevel (dag/fn-with-deps
              "Set log level based on environment"
              [Environment]
              (if (= Environment "production") "WARN" "DEBUG"))
   :MaxConnections (dag/fn-with-deps
                    "Set connection pool size based on environment"
                    [Environment]
                    (if (= Environment "production") 50 10))})
(xform/xform
 {:db-url :DatabaseURL
  :log-level :LogLevel
  :max-conn :MaxConnections}
 app-config)
{:db-url "postgresql://db.example.com:5432/myapp",
 :log-level "WARN",
 :max-conn 50}
(xform/xform
 {:db-url :DatabaseURL
  :log-level :LogLevel
  :max-conn :MaxConnections}
 (assoc app-config :Environment "development"))
{:db-url "postgresql://db.example.com:5432/myapp",
 :log-level "DEBUG",
 :max-conn 10}

6.5.3 Lazy Evaluation

(def lazy-counter (atom 0))
(reset! lazy-counter 0)
0
(xform/xform
 {:needed :A
  :aerial.hanami.templates/defaults
  {:A (fn [_]
        (swap! lazy-counter inc)
        "value-a")
   :B (fn [_]
        (swap! lazy-counter inc)
        "value-b")}})
{:needed "value-a"}
(deref lazy-counter)
1

Only :A computed despite :B being in the environment.

6.5.4 Tableplot Layers

(defn simplified-layer-point [dataset options]
  {:aerial.hanami.templates/defaults (merge {:=dataset dataset
                         :=mark :point
                         :=x-data (dag/fn-with-deps nil [=dataset =x]
                                                    (get =dataset =x))
                         :=y-data (dag/fn-with-deps nil [=dataset =y]
                                                    (get =dataset =y))
                         :=trace (dag/fn-with-deps nil [=x-data =y-data =mark]
                                                   {:type (name =mark)
                                                    :x =x-data
                                                    :y =y-data})}
                        options)
   :kindly/f #(xform/xform %)})

6.6 Summary

Six rewriting rules plus declared dependencies create a system for parameterized data transformations. This work builds on Jon Anthonyโ€™s Hanami, which showed how term rewriting could parameterize data structures. Tableplot adds explicit dependency declaration and caching.

For more information:

source: notebooks/tableplot_book/dataflow_walkthrough.clj