Skip to content

Latest commit

 

History

History
212 lines (175 loc) · 18.7 KB

File metadata and controls

212 lines (175 loc) · 18.7 KB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Core Protocol

  • Follow COLLABORATION.md protocol for multi-agent coordination
  • Update KANBAN.md, STATUS.md, DEVLOG.md, RUNLOG.md after each significant action
  • Update dataset_review.md after every training or evaluation run — document dataset composition changes, new/removed datasets, and data quality observations
  • Always include commit hash and direct GitHub file link in Discord updates
  • For dissertation or writing-file updates, the link must be a real clickable GitHub blob URL, not a bare local path or repo-relative filename
  • For dissertation collaboration with Petrarch, Quimbot should support: obvious copyedits only when violations are clear; database-backed retrieval of student voices/evidence chains; scouting for paragraph-level expansion opportunities; and reciprocal review with Petrarch before proposing end-of-night additions
  • Never commit datasets or large artifacts to Git

Key Decisions Log (2026-03-30)

  • Morning review 3/30 09:00: 16 commits overnight (9 Quimbot, 7 creative-clawing). Style system expansion dominant: em dash paired-aside rule, nominalized subjects ban, What/That opener ban, This+noun rule, pseudo-cleft ban, colon pivots, consecutive clauses, sentence ceiling, macro ID taxonomy with scope profiles. Cloze paper v42 synced to HTML site (Fillenbaum→Rayner, PROP-01/02) + argument fix (pretraining ≠ slow reading). creative-clawing: microblog style sweep, entry-34 Ising closer rewrite, Petrarch audio reader on entry-33 (Daniel/British), crystal+voronoi safe-area insets. Data static (superset13 at 86,419). OpenRouter 402 day 33.

Key Decisions Log (2026-03-29)

  • Evening review 3/29 21:00: 16 commits full day (4 Quimbot, 12 creative-clawing). Gallery: Mohr hypercube (4D tesseract, double stereographic projection, four rotation modes) + microblog #35 ("A Figure That Requires Four Axes"). Entry retitling pass across 4 entries. Mobile responsive fixes on mohr/ifs/astar. Newton back-btn href normalization across 16 artifacts. Writing: cloze paper genealogy framing pass. Petrarch contributed 3 STYLE_GUIDE rules. Artifact count: 91 gallery + 33 microblogs. Data static (superset13 at 86,419). OpenRouter 402 day 32. Sunday pace: steady curation day.
  • Morning review 3/29 09:00: 7 commits overnight (2 Quimbot, 5 creative-clawing). Quiet Sunday after 115-commit Saturday. Quimbot: microblog #25 (spirograph hypotrochoids), STYLE_GUIDE update from Petrarch (earned assertion gate, same-question test, single separator). creative-clawing: microblog #34 (Ising model), iframe sandbox hardened to allow-scripts only (dropped allow-same-origin), newton back-btn href normalization. Data static (superset13 at 86,419). OpenRouter 402 day 32. Tasked Petrarch: Section IV draft, superset13 spot-check, billing fix push.

Key Decisions Log (2026-03-28)

  • Evening review 3/28 21:00: 115 commits (33 Quimbot, 82 creative-clawing). Biggest commit day on record. Gallery: full HiDPI + iframe audit. Root causes: (1) HiDPI drawImage bug — ctx.setTransform(dpr,...) active during drawImage() draws at 1/DPR² size; fix: draw to physical dims with identity transform. (2) Sync resize at script end reads 0/stale dimensions; fix: requestAnimationFrame(()=>{resize();setTimeout(resize,200);init();loop();}). (3) innerWidth/innerHeight unreliable in iframes; use canvas.offsetWidth||innerWidth. (4) Undefined function calls (resize() vs handleResize()/render()) killed animations. (5) MAX_LIVE_IFRAMES cap too low for homepage/gallery; raised + drainPending() on load events. Infrastructure: QUALITY.md pre-ship checklist, tests/lint_gallery.py automated lint, lint-gallery.yml CI. hatmonotile + snowflake removed (quality gate). Artifact count: 89. Cloze paper: v42 with PASSAGE A (occlusion/closure/Kanizsa), intro edits, editor JS extraction + block delete. Style: submarine false-dilemma ban, vacuous openers ban. Data lane static (superset13 at 86,419). OpenRouter 402 day 31.

Key Decisions Log (2026-03-27)

  • Evening review 3/27 21:00: 36 commits full day (10 Quimbot, 26 creative-clawing). Data: off-day generation produced 10,413 new TOEFL entries across 7 iterative scripts (17 new error categories: BU-BZ, CD-CN; new domains: medical, legal, business, tech, school-age); superset12 at 76,419. Quimbot: CUNY Commons featured sites (20 candidates, 10 confirmed live), CALI script rewrites finalized, cloze paper genealogy pass. creative-clawing: technical debt sweep day (100dvh on 73 files, manifest agent casing normalization with hard validation gate, IFS artifact untracked fix, sand.html iframe deferred init, iframe load listener restore, SW cache bust to cc-v2, mobile audit pass, thomas attractor scale correction, back button href sweep, submit-artifact.yml YAML repair, CSS semicolon repair on 14 files). Dataset generation saturation noted: 7 iterative passes needed to clear 10k; future off-days will need new grammatical structures and domains.
  • Morning review 3/27 09:00: 2 commits overnight (CALI script rewrites — three length variants 250/500/750w, AmigAI arc reframe per milwrite brief). Writing lane active; data/training lane static. OpenRouter 402 day 30. Superset11 at 66,006.

Key Decisions Log (2026-03-26)

  • Evening review 3/26 21:00: 17 commits full day (10 Quimbot, 7 creative-clawing). Writing: cloze paper genealogy framing pass + narrator line restore + Veldre et al. cut per milwrite ruling + v38 closing rejection. Pages build: 3 commits fixing nested-repo dirs, dangling submodule, knowledge-collections-repo. creative-clawing: homepage perf (shared rAF, iframe cap 32, unified lifecycle), dadras mobile scope fix, astar iframe timing + service worker offline cache. OpenRouter 402 day 29. Superset11 at 66,006.
  • Morning review 3/26 09:00: 5 commits overnight. creative-clawing: PageRank (power iteration + random surfer) + Snowflake (Reiter hex CA) artifacts, static thumbnail placeholders for homepage/gallery cards, rossler/schelling 100dvh normalize. Quimbot: gyro parallax starfield smoothing + flowfield fade fix. OpenRouter 402 day 29. Superset11 at 66,006.

Key Decisions Log (2026-03-25)

  • Night review 3/25 21:00: ~23 commits full day. Writing: cloze paper v37→v38 + STYLE_GUIDE LLM-as-judge gate. creative-clawing: 3 new artifacts (Dadras, Sprott, KH instability) + mobile fixes on 25 files + iframe black-card bug fixes. Kalshi config v2. Data: 8,177/10k clean from toefl_batch_20260325. OpenRouter 402 day 28.
  • Morning review 3/25: ~18 commits overnight across 3 repos
  • creative-clawing iframe + mobile sweep: 19 artifacts had .panel/.controls hidden for clean iframe embeds; hatmonotile/gradient/astar got mobile responsive fixes; astar auto-demo loop added; nav submenu overflow:hidden removed; lotkavolterra mobile layout fixed
  • Microblogs #30 + #31: Lotka-Volterra (phase orbits, Adriatic fish) + Mandelbrot (smooth coloring, Julia/Fatou/Mandelbrot history)
  • Kalshi config finalized: $5 fixed bets, 10 trades/run, 10% edge threshold (a655047)
  • OpenRouter 402 day 28

Key Decisions Log (2026-03-24)

  • Superset10 merged (59,509 rows): new high-water mark; 8 new error categories from batch_20260323 (subjunctive, cleft sentences, ellipsis/substitution)
  • Kalshi NO-only pivot: milwrite clarified via voice note: trade NO when threshold is absurdly out of reach; removed YES branch from evaluate_signal(); added price_tracker.py + historical CSV export (adfcd1b)
  • Cloze reader editor PAT restore: GitHub blocked hardcoded token; now prompts once, caches in localStorage (c3f2b172)
  • Cloze paper citation fix: replaced generic PMC/arXiv refs with proper author citations (Veldre et al., Jacobs et al.) (f140cba0)
  • creative-clawing iframe fix: CSS-first control hiding on 9 artifacts; stops flash in homepage card previews (55c64b3)
  • Microblog #29: Truchet tiles published (394a6e5)
  • OpenRouter 402 day 27

Key Decisions Log (2026-03-23)

  • toefl_batch_20260323 generated: 11,240 rows via 3 local generation scripts (gen_20260323.py, v2, supplement)
  • Data audit: superset3_cleaned.jsonl rows lack system prompts (all start with user role); needs fix before merge
  • Superset9 (45,555 rows) confirmed missing from local disk — only superset3_cleaned + today's batch on Legion
  • Microblog #22 shipped: Schotter (Georg Nees, 1968) — gravel and the gradient of chaos (e58a17a6)
  • Cloze paper prose fix: overcommitted divergence paragraph replaced with review-frame sentence (3ffa2f2a)
  • OpenRouter 402 day 26 — still blocking cloud generation

Key Decisions Log (2026-03-22)

  • Cloze paper post-v39 fixes — 'cannot read slowly' contrast restored (with humans, not total absence); hanging Firth quote fixed; bad style rule reverted; 2 new style rules added (03df4b73, ac987799)
  • Site synced to v37 with cross-links between site deployment and draft.md (1d5773a2, 25e84602)
  • Overnight v36→v39 — paragraph bridging, colon sweep (11 eliminated), verb audit, genealogy condensed
  • Writing system expanded — CHECKLIST_COPY, CHECKLIST_REVISE, PROCESS_GUIDE added for phase-decomposed style with conditional routing
  • Single draft.md canonical — versioned filenames eliminated, git handles history; writing/ directory reorganized with subdirectories

Agent Documentation

Agent coordination files are in the project root:

  • COLLABORATION.md - Multi-agent workflow protocol
  • KANBAN.md - Project board and stand-up notes
  • STATUS.md - Real-time status updates
  • DEVLOG.md - Timestamped work log
  • RUNLOG.md - Training run history
  • NEXT-ACTIONS.md - Prioritized action items

Project Structure

molt/
├── README.md              # Project overview
├── CLAUDE.md              # This file (agent instructions)
├── dataset_review.md      # Dataset review notes (UPDATE AFTER EVERY RUN)
├── COLLABORATION.md       # Multi-agent workflow protocol
├── KANBAN.md              # Project board
├── STATUS.md              # Real-time status
├── DEVLOG.md              # Timestamped work log
├── RUNLOG.md              # Training run history
├── NEXT-ACTIONS.md        # Prioritized action items
├── LoRA-ROADMAP.md        # LoRA training roadmap
├── evaluation/            # Model evaluation framework
├── fine-tuning/           # Training scripts, workflows, and working JSONL data
├── datasets/              # Training data JSONL files (not committed)
├── checkpoints/           # Model checkpoints (not committed)
└── research/              # Research notes and plans

Commands

Evaluation (recommended: v2)

pip3 install -r evaluation/requirements-eval.txt
python3 evaluation/qwen-eval-v2.py --models qwen2.5:8b --verbose
python3 evaluation/qwen-eval-v2.py --config evaluation/eval-config-example.yaml --workers 8 --cache

Fine-tuning (requires Python 3.11+)

python3.11 -m pip install -r fine-tuning/requirements.txt --user
python3.11 fine-tuning/prepare_stage1.py
python3.11 fine-tuning/run_tinker_lora.py --data datasets/stage1_mix_200k.jsonl --batch 64 --rank 16 --save-every 50
python3.11 fine-tuning/test_lora_model.py --lora-weights checkpoints/lora_*/final_lora_weights --compare-base

A2A Bridge (Node.js, zero dependencies)

node a2a-bridge.mjs                    # listens on 0.0.0.0:18800
A2A_PORT=9000 node a2a-bridge.mjs      # custom port

Architecture

  • fine-tuning/run_tinker_lora.py — primary training script; uses tinker.ServiceClient, converts JSONL ChatML → token streams with loss masking on assistant tokens only, saves checkpoints to tinker:// URIs
  • evaluation/qwen_eval/ — evaluation package: core.py (engine + parallel workers + caching), metrics.py (15+ metrics), test_suites.py (4 built-in suites), reporters.py (JSON/Markdown/Comparison)
  • a2a-bridge.mjs — standalone HTTP server exposing OpenClaw via Google A2A protocol (JSON-RPC: tasks/send, tasks/get, tasks/cancel), proxies to http://127.0.0.1:18789/v1/chat/completions

Data & Environment

  • Training data is JSONL in ChatML format (OpenAI-style messages). Datasets (.jsonl, ~4.5GB) and model files (.gguf) are gitignored.
  • Synth followups (concatenated outputs):
    • fine-tuning/data/toefl_synth_followups_concat_20260212.jsonl (5742 lines)
    • fine-tuning/data/pilot_concat_20260212.jsonl (1610 lines)
    • These are convenience concat files under gitignored fine-tuning/data/.
  • QA/Audit helpers (reproducible):
    • fine-tuning/qa_followups_jsonl.py (fast schema/sanity scan)
    • fine-tuning/consolidate_followups.py (merge/normalize multiple followups JSONLs)
    • fine-tuning/audit_toefl_followups.py (deeper audit; issue + dupe counts)
  • Required env vars for training: TINKER_API_KEY, TINKER_API_BASE, HF_TOKEN

2026-02-16 notes (recent decisions / state)

  • Synth followups audit triage (see agents/KANBAN.md): TOEFL concat issues appear filterable:
    • 30 rows with empty assistant content
    • 2 role alternation violations
    • JSON parse errors = 0
  • OpenRouter generation scaling currently blocked by HTTP 402; do not assume generation scripts will run until billing/key routing is fixed.
  • Stage 1 mix decision pending: whether to hard-dedup synth followups or keep duplicates as implicit weighting.
  • Repo hygiene: sidequest microlearning scripts exist under sidequests/microlearning/; do not commit sidequests/microlearning/data/ artifacts (treat like datasets/output).

2026-02-24 notes (nightly review)

  • Nightly stocktake completed and committed (4efefe28): added fine-tuning/data/INVENTORY.md with per-file row counts and dedup status.
  • Dataset state re-confirmed at 33,834 rows across 7 tracked JSONL files in fine-tuning/data/.
  • Evaluation scripts were syntax-validated (evaluation/qwen-eval.py, evaluation/qwen-eval-v2.py) and are execution-ready once weights land.
  • New generation utility scaffold added: fine-tuning/scripts/generation/generate_toefl_ollama_10k.py for local Ollama batch synthesis.
  • Hard blocker remains unchanged: Stage 1 Run 4 adapter weights are not on local disk (day 2 blocked).
  • Secondary blocker remains: OpenRouter HTTP 402 prevents scale-out synthetic generation.

2026-02-25 notes (morning review)

  • Overnight local generation added three fresh outputs with +156 rows total (92 + 51 + 13).
  • Quick validation pass confirmed toefl_ollama_batch_20260224_2130_clean.jsonl at 21/21 valid rows.
  • Working total reported in stand-up context is now 34,011 rows before dedup merge finalization.
  • Immediate next data task: merge new outputs into a staging JSONL, dedup against current superset, then refresh fine-tuning/data/INVENTORY.md.
  • Highest-priority execution path is unchanged: Run 4 checkpoint eval starts as soon as adapter weights (step 350 + final) are available locally.

2026-02-25 notes (evening review)

  • No additional commits landed during daytime; progress concentrated on coordination, status hygiene, and handoff prep.
  • Nightly trend artifact added at reports/nightly/stocktake-2026-02-25.md (branch, delta, dataset location snapshot).
  • The execution queue for next cycle is explicit: merge + dedup + recount first, then immediate eval trigger once weights are provided.
  • Blocking conditions are unchanged: missing local adapter weights for Run 4 eval and OpenRouter HTTP 402 for scale-out generation.

2026-03-18 notes (morning review)

  • Cron jobs back online after 3-day pause (milwrite halted all 9 jobs on 3/15, Wednesday lift now in effect).
  • 2 overnight commits: cloze reader citation chain upgrades — Bommasani, Oller, Bachman, Peters, Gao/Pile, Carlini, Vygotsky/Wood/Pea (4f672690, 2e65b8ed).
  • OpenRouter 402 now day 21 — still blocking cloud generation.
  • Superset7 at 39,133 rows, verification pending.
  • Petrarch push auth still blocked (zmuhls lacks write on milwrite/quimbot).

2026-02-26 notes (morning review)

  • Morning commit stream is active: stand-up sync + doc/microblog tightening + gallery visualization additions (d14cb717 latest).
  • Site generation artifacts updated today under docs/gallery/ and docs/index.html; this lane is shipping while training/eval remains blocked.
  • fine-tuning/prospects/cron.log shows repeated Discord notifier failures caused by shell call to missing openclaw binary (/bin/sh: 1: openclaw: not found).
  • Keep OpenClaw-native messaging for status posts (message tool / API path), do not rely on local CLI availability in cron environments.
  • Core blockers are still the same: missing Run 4 adapter weights locally and OpenRouter HTTP 402.

Archived Decisions

Entries older than 2 weeks, archived 2026-04-05.

Key Decisions Log (2026-03-21)

  • Cloze paper at v35 — 4 draft versions in one day; first-person narrator grafted at 8 points (v33); logical prepositions pass (v34); Harris/Firth/Mikolov/Peters/Devlin genealogy expanded and thesis restored to intro (v35)
  • Style system disaggregated — monolithic STYLE_GUIDE.md split into 6 modular files (anti-patterns, diction, paragraphs, sentences, structure, voice) with SKILL.md router in style/
  • Superset9 merged at 45,555 rows — row count lower than superset8 (46,943) due to dedup/filtering variance
  • Pages build fix — broken a11y-checker submodule ref removed and gitignored

Key Decisions Log (2026-03-20)

  • Superset8 merged at 46,943 rows — new dataset high-water mark
  • Writing hub page shipped/writing/ with card layout + commit metadata on creative-clawing
  • Cloze reader browser editor added — password-gated, highlight + commit to GitHub (major tooling for milwrite)
  • AI detection essay updated to v4 on live site
  • Session 2b verification results documented in SESSION_STATE

Key Decisions Log (2026-03-19)

  • Paper title finalized: "Fill in the Blank: Cloze Reader and the Twin Histories of Occlusion"
  • Colon/semicolon audit rule added to STYLE_GUIDE (Petrarch's connector-surfacing principle)
  • 3 more style rules: trailing participle phrases ban, anaphora abuse ban, tricolon abuse ban
  • Draft at v31: ~2,800+ words, Zhang & Hashimoto 2021 + Ondov 2024 in bibliography
  • Closing sentence inverts training signal framing — "inductive bias" pushed to body

Key Decisions Log (2026-03-18)

  • Cloze reader live draft is additive-only — never rewrite existing v14 prose; only insert new sentences or extend. milwrite preserves voice.
  • JOURNAL.md is the canonical shared record for cloze-reader paper (both bots maintain it)
  • Style guide now has 3 new rules: bridging constructions ban, nominalization ban, em-dash enclosure rule
  • Kalshi sidequest: weather + CPI only (trimmed from broader scope)
  • Session 2 requires formal S1→S2 token from milwrite before scope opens