1212
GitHub - brege/sanoma: A data exporter and visualizer of emails from Thunderbird's Gloda DB · GitHub
Skip to content

brege/sanoma

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sanoma

Visualizing and analyzing email trends with YAML workflows and fast Python scripts. Export Thunderbird's Gloda database to JSON for convenient slicing and filtering.

See brege.org/sanoma for in-depth analysis.

Setup

Installation

curl -LsSf https://astral.sh/uv/install.sh | sh
uv tool install -e .

This installs the sanoma command globally without creating a local venv.

Thunderbird Profile

  1. Ensure your Thunderbird profile is indexed.

    Thunderbird Index Settings Thunderbird Index Settings
  2. Copy your Thunderbird profile to data/profiles/:

    cp -r ~/.thunderbird/*.default-release data/profiles/

    You will need to close Thunderbird because it locks the database while it's running. The profile directory name looks like xyzabc12.default-release.

  3. Create config.yaml from config.example.yaml and add the xyzabc12.default-release profile path.

Performance

Since the tool uses direct "Gloda" (Global Database) access, JSON extraction takes roughly 2 seconds for 35K emails on a 2015 netbook.

Workflows

sanoma uses YAML workflows in workflows/ to define multi-step analysis pipelines.

The workflow runner automatically discovers and executes tools from sanoma/analysis/ and sanoma/plot/, making it easy to chain data extraction, filtering, analysis, and visualization into reproducible pipelines.

Run any workflow:

sanoma workflow workflows/spam.yaml

Generated plots are written to data/plots/*.

Examples

1) University Email Reference

Tool Script
Workflow Snippet workflows/wsu.yaml
Plot script sanoma/plot/timeline.py
Analysis script sanoma/analysis/timeline.py
sanoma workflow workflows/wsu.yaml

1.1) University Email Seasonality

Academic-year seasonality appears as repeated term-time peaks and summer drops.

Grad-school Emails (monthly)

Workflow

steps:
  - name: plot wsu timeline
    action: plot_temporal
    params:
      input: data/extract/all.json
      filter_domain: wsu.edu
      plot_type: timeline
      output_dir: data/plots/wsu
      title: WSU Email Volume Analysis
      display: save
Manual Command
uv run sanoma/plot/timeline.py \
  data/extract/all.json \
  --plot-type timeline \
  --output-dir data/plots/wsu \
  --title "WSU Email Volume Analysis" \
  --filter-domain wsu.edu \
  --display save

1.2) University Email Volume

Year-Month Histogram

Year-over-year bars show the same seasonal cadence when grouped by month.

Grad-school Emails (yearly)

Workflow

steps:
  - name: plot wsu histogram
    action: plot_temporal
    params:
      input: data/extract/all.json
      filter_domain: wsu.edu
      plot_type: histogram
      output_dir: data/plots/wsu
      title: WSU Email Volume Analysis
      display: save
Manual Command
uv run sanoma/plot/timeline.py \
  data/extract/all.json \
  --plot-type histogram \
  --output-dir data/plots/wsu \
  --title "WSU Email Volume Analysis" \
  --filter-domain wsu.edu \
  --display save

2) Spam Email Reference

Tool Script
Workflow Snippet workflows/spam.yaml
Plot script sanoma/plot/spam.py
Analysis script sanoma/analysis/spam.py
sanoma workflow workflows/spam.yaml

2.1) Spam Timeline

Spam frequency rises sharply after enrollment years and remains persistently high.

Marketing Spam Trends

Email spam accumulation and its increasing share of all emails over time.

Spam Timeline

Workflow

steps:
  - name: plot spam timeline
    action: plot_spam_trends
    params:
      input: data/analysis/spam/keywords.json
      plot_type: timeline
      output_dir: data/plots/spam
      title: Marketing Spam Trends Analysis
      display: save
Manual Command
uv run sanoma/plot/spam.py \
  data/analysis/spam/keywords.json \
  --plot-type timeline \
  --output-dir data/plots/spam \
  --title "Marketing Spam Trends Analysis" \
  --display save

2.2) Spam Type Distribution

Keyword totals show which marketing-language patterns dominate over time.

Spam Keywords

Workflow

steps:
  - name: plot spam keywords
    action: plot_spam_trends
    params:
      input: data/analysis/spam/keywords.json
      plot_type: keywords
      output_dir: data/plots/spam
      title: Marketing Spam Trends Analysis
      display: save
Manual Command
uv run sanoma/plot/spam.py \
  data/analysis/spam/keywords.json \
  --plot-type keywords \
  --output-dir data/plots/spam \
  --title "Marketing Spam Trends Analysis" \
  --display save

2.3) Spam Heatmap: The Curse of Satisfaction Surveys

The heatmap highlights long-running persistence of recurring keyword families by year.

Spam Heatmap

Workflow

steps:
  - name: plot spam heatmap
    action: plot_spam_trends
    params:
      input: data/analysis/spam/keywords.json
      plot_type: heatmap
      output_dir: data/plots/spam
      title: Marketing Spam Trends Analysis
      display: save
Manual Command
uv run sanoma/plot/spam.py \
  data/analysis/spam/keywords.json \
  --plot-type heatmap \
  --output-dir data/plots/spam \
  --title "Marketing Spam Trends Analysis" \
  --display save

Usage

Command Line Interface

Extract complete dataset from Thunderbird:

sanoma extract [--output data/extract/all.json]

Filter emails by criteria:

sanoma filter input.json output.json --domain "*.edu" --year 2023

Query emails by content pattern:

sanoma query input.json output.json --pattern "unsubscribe"

Show dataset statistics:

sanoma stats input.json

Domain Analysis

Analyze domains producing emails with specific patterns:

uv run sanoma/analysis/domains.py \
  data/extract/all.json "*.edu" \
  --pattern "unsubscribe" \
  --threshold 0.95

License

GPLv3

About

A data exporter and visualizer of emails from Thunderbird's Gloda DB

Topics

Resources

Stars

Watchers

Forks

Contributors

Languages