Visualizing and analyzing email trends with YAML workflows and fast Python scripts. Export Thunderbird's Gloda database to JSON for convenient slicing and filtering.
See brege.org/sanoma for in-depth analysis.
curl -LsSf https://astral.sh/uv/install.sh | sh
uv tool install -e .This installs the sanoma command globally without creating a local venv.
-
Ensure your Thunderbird profile is indexed.
-
Copy your Thunderbird profile to
data/profiles/:cp -r ~/.thunderbird/*.default-release data/profiles/
You will need to close Thunderbird because it locks the database while it's running. The profile directory name looks like
xyzabc12.default-release. -
Create
config.yamlfromconfig.example.yamland add thexyzabc12.default-releaseprofile path.
Since the tool uses direct "Gloda" (Global Database) access, JSON extraction takes roughly 2 seconds for 35K emails on a 2015 netbook.
sanoma uses YAML workflows in workflows/ to define multi-step analysis pipelines.
The workflow runner automatically discovers and executes tools from sanoma/analysis/ and sanoma/plot/, making it easy to chain data extraction, filtering, analysis, and visualization into reproducible pipelines.
Run any workflow:
sanoma workflow workflows/spam.yamlGenerated plots are written to data/plots/*.
| Tool | Script |
|---|---|
| Workflow Snippet | workflows/wsu.yaml |
| Plot script | sanoma/plot/timeline.py |
| Analysis script | sanoma/analysis/timeline.py |
sanoma workflow workflows/wsu.yamlAcademic-year seasonality appears as repeated term-time peaks and summer drops.
steps:
- name: plot wsu timeline
action: plot_temporal
params:
input: data/extract/all.json
filter_domain: wsu.edu
plot_type: timeline
output_dir: data/plots/wsu
title: WSU Email Volume Analysis
display: saveManual Command
uv run sanoma/plot/timeline.py \
data/extract/all.json \
--plot-type timeline \
--output-dir data/plots/wsu \
--title "WSU Email Volume Analysis" \
--filter-domain wsu.edu \
--display saveYear-over-year bars show the same seasonal cadence when grouped by month.
steps:
- name: plot wsu histogram
action: plot_temporal
params:
input: data/extract/all.json
filter_domain: wsu.edu
plot_type: histogram
output_dir: data/plots/wsu
title: WSU Email Volume Analysis
display: saveManual Command
uv run sanoma/plot/timeline.py \
data/extract/all.json \
--plot-type histogram \
--output-dir data/plots/wsu \
--title "WSU Email Volume Analysis" \
--filter-domain wsu.edu \
--display save| Tool | Script |
|---|---|
| Workflow Snippet | workflows/spam.yaml |
| Plot script | sanoma/plot/spam.py |
| Analysis script | sanoma/analysis/spam.py |
sanoma workflow workflows/spam.yamlSpam frequency rises sharply after enrollment years and remains persistently high.
Email spam accumulation and its increasing share of all emails over time.
steps:
- name: plot spam timeline
action: plot_spam_trends
params:
input: data/analysis/spam/keywords.json
plot_type: timeline
output_dir: data/plots/spam
title: Marketing Spam Trends Analysis
display: saveManual Command
uv run sanoma/plot/spam.py \
data/analysis/spam/keywords.json \
--plot-type timeline \
--output-dir data/plots/spam \
--title "Marketing Spam Trends Analysis" \
--display saveKeyword totals show which marketing-language patterns dominate over time.
steps:
- name: plot spam keywords
action: plot_spam_trends
params:
input: data/analysis/spam/keywords.json
plot_type: keywords
output_dir: data/plots/spam
title: Marketing Spam Trends Analysis
display: saveManual Command
uv run sanoma/plot/spam.py \
data/analysis/spam/keywords.json \
--plot-type keywords \
--output-dir data/plots/spam \
--title "Marketing Spam Trends Analysis" \
--display saveThe heatmap highlights long-running persistence of recurring keyword families by year.
steps:
- name: plot spam heatmap
action: plot_spam_trends
params:
input: data/analysis/spam/keywords.json
plot_type: heatmap
output_dir: data/plots/spam
title: Marketing Spam Trends Analysis
display: saveManual Command
uv run sanoma/plot/spam.py \
data/analysis/spam/keywords.json \
--plot-type heatmap \
--output-dir data/plots/spam \
--title "Marketing Spam Trends Analysis" \
--display saveExtract complete dataset from Thunderbird:
sanoma extract [--output data/extract/all.json]Filter emails by criteria:
sanoma filter input.json output.json --domain "*.edu" --year 2023Query emails by content pattern:
sanoma query input.json output.json --pattern "unsubscribe"Show dataset statistics:
sanoma stats input.jsonAnalyze domains producing emails with specific patterns:
uv run sanoma/analysis/domains.py \
data/extract/all.json "*.edu" \
--pattern "unsubscribe" \
--threshold 0.95




