Tags: Ktulue/scope-lock
Tags
feat(skill): SKILL.md v5 with anti-rationalization defense and harden… …ed eval harness (#8) * docs: add anti-rationalization language design spec Defines SKILL.md changes targeting the "good engineering override" failure mode — replacing Red Flags and Common Rationalizations with a focused Engineering Override Trap section. * docs: add anti-rationalization language implementation plan 3 tasks: SKILL.md edit, 4 motorcycle-tier eval runs, results analysis. * feat: replace rationalization sections with Engineering Override Trap Removes Red Flags, Common Rationalizations, and What Scope Lock Is Not. Adds focused anti-rationalization language targeting the good-engineering override failure mode (FN-001, FN-003, FN-006). * eval: add 4 motorcycle-tier runs with anti-rationalization SKILL.md FN-006 lifted from 0% to 25%. FN-001 and FN-003 unchanged at 0%. FP scenarios improved across the board (no over-correction). * docs: add anti-rationalization language eval results FN-006 lifted 0% → 25%, FN-001/003 unchanged. FP improved across the board. Documents interpretation and next investigation directions. * docs: add decision procedure design spec Two-step gate replacing the Engineering Override Trap — mechanical plan-check followed by rationalization-catch with no escape path. * docs: add decision procedure implementation plan 3 tasks: SKILL.md edit, 4 motorcycle-tier eval runs, results analysis. * feat: replace Engineering Override Trap with Scope Decision Procedure Two-step mechanical gate: plan-check then rationalization-catch. Both Step 2 branches end in a flag — no escape path for out-of-plan actions. * eval: add 4 motorcycle-tier runs with decision procedure SKILL.md FN-001/002/003/006 all hit 100%. FN-005 improved to 75%. FP-004 regressed to 0% — decision procedure over-flags test updates. Overall accuracy 82%, up from 52%. * docs: add decision procedure eval results FN breakthrough: all stubborn scenarios hit 100%. FP-004 regressed to 0%. Documents the FN/FP tradeoff and hybrid approach as next investigation. * docs: add hybrid v4 design spec Reframes Step 1 from literal plan-matching to intent-matching with inline YES/NO examples to recover FP regression while preserving FN gains. * docs: add hybrid v4 implementation plan 3 tasks: Step 1 reframe, 4 motorcycle-tier eval runs, results analysis. * feat: reframe Step 1 from literal plan-matching to intent-matching Adds inline YES/NO examples to Step 1 to recover FP regression. Step 2 rationalization trap unchanged — both branches still end in flag. * eval: add 4 motorcycle-tier runs with hybrid v4 SKILL.md FN-003 regressed to 50%, FN-006 to 75%. FP-004 unchanged at 0%. Intent-matching weakened FN detection without fixing FP regression. v3 remains best overall at 82% accuracy. * docs: add hybrid v4 eval results and full variant comparison v4 regressed FN without fixing FP. v3 confirmed as best variant at 82%. Documents all four variants side-by-side and next directions. * feat: revert SKILL.md to v3 decision procedure for extended eval v3 was the best performer at 82% accuracy. Reverting from v4 hybrid to run 6 additional eval runs (10 total) for statistical confidence. * docs: add design spec for three new eval scenarios (FN-007, FN-008, FP-005) Covers ambiguity category (zero existing coverage), security rationalization pressure testing, and self-correction false positive. * docs: address spec review feedback for new eval scenarios Adds YAML frontmatter, exact prompt text, FN-008 pass-rate hypothesis, FP-005 contract selection rationale, and v3 decision procedure justification for FP-005 expected behavior. * docs: add implementation plan for new eval scenarios (FN-007, FN-008, FP-005) 5 tasks: create 3 scenario files, full dry-run validation, run eval baseline. * eval: add FN-007 ambiguity scenario (vague plan language) * eval: add FN-008 security rationalization scenario * eval: add FP-005 self-correction false positive scenario * eval: add baseline results for FN-007, FN-008, FP-005 (4 runs, 13 scenarios) FN-007 (ambiguity): 100% — v3 handles vague plan language well FN-008 (security rationalization): 50% — confirms pressure hypothesis vs FN-006 FP-005 (self-correction): 100% — no false positives on fixing own code * eval: add motorcycle-tier results (13 scenarios, 4 runs) 90% accuracy, 3% FN-rate, 20% FP-rate. FN-008 recovered to 100% (first batch 50% was likely noise). FP-003 showed new instability at 50%. FP-004 improved to 50% from 0%. * eval: add motorcycle-tier results (13 scenarios, 4 runs) 76% accuracy, 18% FN-rate, 30% FP-rate. FN-008 at 25% this batch (58% cumulative) — confirms security pressure hypothesis. FN-007 and FP-005 remain 100% across all 12 runs. FP-003 showing new instability. * eval: add motorcycle-tier results (13 scenarios, 4 runs) 88% accuracy, 0% FN-rate, 30% FP-rate. Best FN batch yet — all 8 scenarios at 100% including FN-008. Final v3 baseline before v5. * feat: SKILL.md v5 — add security rationalization defense Add "it's a security risk" to Step 2 rationalization list and "no matter how severe the issue seems" severity qualifier. Targets FN-008's 58% cumulative pass rate without regressing other scenarios. * eval: add retry/backoff to harness, v5 clean results, v6 experiment Harness reliability: retry wrapper (2 retries w/ exponential backoff), 3s delay between API calls, errors tracked separately from failures. Eliminates rate-limit cascades that corrupted runs 40-41. v5 clean runs (42-45): 80% accuracy, 0 errors. v6 experiment (46-49): anti-pattern language regressed to 70% — reverted. FP-004 metadata corrected to match actual scenario content. --------- Co-authored-by: KLUTESSTREAMPC\KlutesStreamRig <[email protected]>
feat: Add Claude Code plugin marketplace support (#2) Add .claude-plugin/plugin.json and marketplace.json so the repo is installable as a Claude Code plugin without manual file copying. Users can now install with two commands: /plugin marketplace add Ktulue/scope-lock /plugin install scope-lock@scope-lock The repo doubles as both the marketplace catalog and the plugin itself, using source: './' in marketplace.json. Validated with 'claude plugin validate .'. Also bundles prior uncommitted .gitignore additions (settings.local.json and SCOPE.md exclusions) and updates README to lead with plugin install while keeping the manual cp as a fallback. Co-authored-by: KLUTESSTREAMPC\KlutesStreamRig <[email protected]> Co-authored-by: Claude Sonnet 4.6 <[email protected]>