JSON → TOON Converter Tool

A Python implementation of Token-Oriented Object Notation (TOON) with novel algorithmic extensions that outperform the official specification on real-world heterogeneous API datasets.

What is TOON?

TOON (Token-Oriented Object Notation) is a data serialization format designed to minimize the number of BPE tokens consumed when structured data is sent to Large Language Models (LLMs). It uses:

Tabular Arrays — CSV-style header + rows instead of repeating keys per object
YAML-style indentation — for nested non-tabular structures
No redundant quotes — primitive values are unquoted where safe

Our Novel Extensions

This implementation extends the official TOON specification with three architectural innovations:

1. Auto-Flattening Engine

Nested dictionaries inside arrays are recursively flattened to dot-notation keys before tabular encoding:

commit.author.name,commit.author.email,...
Alice,[email protected],...

This prevents the YAML fallback that makes the official implementation worse than JSON Compact on deeply nested datasets like github_commits.json.

2. Sparse Tabular Arrays

Arrays where objects have heterogeneous key sets are now encoded sparsely using empty unquoted cells for missing keys. The official spec aborts tabular encoding entirely when one object is missing a field — we don't:

id,title,brand,price:
  1,Mascara,Essence,9.99
  16,Apple,,1.99        ← 'brand' is absent, encoded as empty cell
  6,CK One,Calvin Klein,49.99

3. Generator-Based Trampoline Parser

The decoder uses an iterative heap-stack trampoline instead of Python recursion, making it immune to RecursionError on arbitrarily deep JSON structures.

Benchmark Results

Measured on real API payloads using cl100k_base BPE (GPT-4o-mini tokenizer):

Dataset	Structure Type	JSON Pretty	JSON Compact	TOON (Ours)	vs Pretty	vs Compact
`flat_employees.json`	Uniform flat array	509	299	187	−63.3%	−37.5%
`products.json`	Heterogeneous objects	17,916	13,125	10,041	−44.0%	−23.5%
`weather.json`	Numeric time-series	3,325	2,607	2,611	−21.5%	≈ 0%
`github_commits.json`	Deep nested tree	12,530	11,433	10,620	−15.2%	−7.1%

Note on weather.json: Both TOON and JSON Compact are optimal for pure numeric arrays. TOON still saves 21.5% vs JSON Pretty by eliminating whitespace.

Project Structure

json_to_toon_tool/
├── project/
│   ├── mini_toon/          # Core TOON library
│   │   ├── encoder.py      # JSON → TOON (with Auto-Flattening + Sparse Arrays)
│   │   ├── decoder.py      # TOON → JSON (with Trampoline parser + unflatten)
│   │   ├── strings.py      # BPE-aware string quoting rules
│   │   └── types.py        # DecodeError, Line
│   ├── tests/
│   │   ├── test_encoder.py
│   │   ├── test_decoder.py
│   │   ├── test_flatten.py     # Auto-Flattening round-trip tests
│   │   └── test_sparse_arrays.py  # Sparse Tabular Array tests
│   ├── real_samples/       # Real-world API JSON payloads
│   ├── samples/            # Flat sample data
│   ├── web/                # Web UI (HTML/CSS/JS)
│   └── server.py           # HTTP server for the Web UI
└── research_paper/
    └── research_paper_draft.md

Quick Start

Prerequisites

pip install tiktoken pyyaml pytest

Run the Web UI

python project/server.py
# Open http://localhost:8000

Run Tests

cd project
python -m pytest tests/ -v

Run a Quick Benchmark

import json, sys
sys.path.insert(0, 'project')
from mini_toon.encoder import encode
import tiktoken

enc = tiktoken.get_encoding('cl100k_base')
with open('project/real_samples/products.json') as f:
    data = json.load(f)

compact = json.dumps(data, separators=(',', ':'))
toon = encode(data)
savings = (1 - len(enc.encode(toon)) / len(enc.encode(compact))) * 100
print(f"Token savings vs JSON Compact: {savings:.1f}%")

Web UI Features

Live conversion — paste any JSON, see TOON output instantly
Dual comparison stats — token savings vs both JSON Pretty and JSON Compact
Visual progress bar — see where TOON sits between Pretty and Compact
Built-in sample datasets — load prebuilt examples from the dropdown
Bidirectional — encode JSON→TOON or decode TOON→JSON

Research Paper

A full academic analysis is available in research_paper/research_paper_draft.md, covering:

BPE tokenization mathematics
Big-O parser complexity analysis
Enterprise RAG cost-benefit modeling (1 Billion API calls)
Zero-shot LLM comprehension benchmarks
Novel Contributions: Auto-Flattening, Sparse Tabular Arrays, Trampoline Parser

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
__pycache__		__pycache__
project		project
research_paper		research_paper
README.md		README.md
cleanup_folders.py		cleanup_folders.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JSON → TOON Converter Tool

What is TOON?

Our Novel Extensions

1. Auto-Flattening Engine

2. Sparse Tabular Arrays

3. Generator-Based Trampoline Parser

Benchmark Results

Project Structure

Quick Start

Prerequisites

Run the Web UI

Run Tests

Run a Quick Benchmark

Web UI Features

Research Paper

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

JSON → TOON Converter Tool

What is TOON?

Our Novel Extensions

1. Auto-Flattening Engine

2. Sparse Tabular Arrays

3. Generator-Based Trampoline Parser

Benchmark Results

Project Structure

Quick Start

Prerequisites

Run the Web UI

Run Tests

Run a Quick Benchmark

Web UI Features

Research Paper

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages