Skip to content

fix(agents): drop Anthropic thinking blocks on replay#44843

Merged
frankekn merged 6 commits intomainfrom
frank/44387-anthropic-thinking-replay-replacement
Mar 13, 2026
Merged

fix(agents): drop Anthropic thinking blocks on replay#44843
frankekn merged 6 commits intomainfrom
frank/44387-anthropic-thinking-replay-replacement

Conversation

@frankekn
Copy link
Contributor

Summary

Credit

  • original implementation and primary fix logic by @jmcte
  • this replacement PR keeps the original author commit and adds maintainer follow-up test/changelog prep only

Testing

  • pnpm build
  • pnpm check
  • pnpm test

Fixes #44387

@openclaw-barnacle openclaw-barnacle bot added agents Agent runtime and tooling size: S maintainer Maintainer-authored PR labels Mar 13, 2026
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 13, 2026

Greptile Summary

This PR extends the existing thinking-block-drop mechanism (previously only for github-copilot) to native anthropic and amazon-bedrock Claude providers, fixing a bug where persisted thinking blocks in session history caused API rejections on follow-up turns. The change is minimal and well-targeted: two lines added to PROVIDER_CAPABILITIES, a comment update in two files, and corresponding test updates.

  • provider-capabilities.ts: Adds dropThinkingBlockModelHints: ["claude"] to anthropic and amazon-bedrock entries, consistent with the existing github-copilot entry.
  • transcript-policy.ts / attempt.ts: Comment-only updates to broaden the described scope from "Copilot/Claude" to "Anthropic Claude endpoints."
  • provider-capabilities.test.ts: Adds snapshot assertions for anthropic and amazon-bedrock capability objects verifying the new hints are present.
  • pi-embedded-runner.sanitize-session-history.test.ts: Replaces the previous negative test (asserting thinking blocks are not dropped for anthropic) with a positive test confirming they are dropped, using a new sanitizeAnthropicHistory helper. The helper supports a provider override but no test exercises amazon-bedrock at the session-sanitization level — a minor coverage gap given the PR claims Bedrock coverage. The existing shouldDropThinkingBlocksForModel assertion in provider-capabilities.test.ts is also only checked for github-copilot, not the newly updated providers.

Confidence Score: 4/5

  • This PR is safe to merge — the implementation is correct and consistent with the existing pattern, with only minor test coverage gaps.
  • The core logic change (adding dropThinkingBlockModelHints: ["claude"] to anthropic and amazon-bedrock providers) is straightforward, minimal, and mirrors the existing github-copilot entry exactly. No behavioral regressions were found. The score is 4 rather than 5 due to the absence of an amazon-bedrock end-to-end sanitization test and the lack of direct shouldDropThinkingBlocksForModel assertions for the newly covered providers — both of which the PR description implies are present.
  • src/agents/pi-embedded-runner.sanitize-session-history.test.ts — missing a Bedrock-provider test case to validate the claimed coverage.

Comments Outside Diff (1)

  1. src/agents/provider-capabilities.test.ts, line 94-103 (link)

    No direct shouldDropThinkingBlocksForModel assertion for new providers

    The existing test only calls shouldDropThinkingBlocksForModel for "github-copilot". Now that "anthropic" and "amazon-bedrock" both set dropThinkingBlockModelHints: ["claude"], direct behavioral assertions would make the intent explicit and guard against future regressions:

    expect(
      shouldDropThinkingBlocksForModel({
        provider: "anthropic",
        modelId: "claude-opus-4-6",
      }),
    ).toBe(true);
    expect(
      shouldDropThinkingBlocksForModel({
        provider: "amazon-bedrock",
        modelId: "anthropic.claude-3-5-sonnet-20241022-v2:0",
      }),
    ).toBe(true);

    The indirect coverage via the resolveProviderCapabilities snapshot checks dropThinkingBlockModelHints is present, but a direct behavioral test is clearer.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: src/agents/provider-capabilities.test.ts
    Line: 94-103
    
    Comment:
    **No direct `shouldDropThinkingBlocksForModel` assertion for new providers**
    
    The existing test only calls `shouldDropThinkingBlocksForModel` for `"github-copilot"`. Now that `"anthropic"` and `"amazon-bedrock"` both set `dropThinkingBlockModelHints: ["claude"]`, direct behavioral assertions would make the intent explicit and guard against future regressions:
    
    ```typescript
    expect(
      shouldDropThinkingBlocksForModel({
        provider: "anthropic",
        modelId: "claude-opus-4-6",
      }),
    ).toBe(true);
    expect(
      shouldDropThinkingBlocksForModel({
        provider: "amazon-bedrock",
        modelId: "anthropic.claude-3-5-sonnet-20241022-v2:0",
      }),
    ).toBe(true);
    ```
    
    The indirect coverage via the `resolveProviderCapabilities` snapshot checks `dropThinkingBlockModelHints` is present, but a direct behavioral test is clearer.
    
    How can I resolve this? If you propose a fix, please make it concise.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-runner.sanitize-session-history.test.ts
Line: 777-786

Comment:
**Missing Bedrock-specific sanitization test**

The `sanitizeAnthropicHistory` helper was designed with an optional `provider` parameter precisely to allow testing both `"anthropic"` and `"amazon-bedrock"`, but no test actually passes `provider: "amazon-bedrock"`. The PR description says "add Bedrock coverage," but at the session-sanitization integration level, Bedrock is only covered by the capability-registry snapshot in `provider-capabilities.test.ts`.

Given that Bedrock uses a different `modelApi` (`"bedrock-converse-stream"` vs `"anthropic-messages"`), a dedicated test would confirm that thinking blocks are also dropped for that code path:

```typescript
it("drops assistant thinking blocks for amazon-bedrock replay", async () => {
  setNonGoogleModelApi();

  const messages = makeThinkingAndTextAssistantMessages();

  const result = await sanitizeAnthropicHistory({
    messages,
    provider: "amazon-bedrock",
  });

  const assistant = getAssistantMessage(result);
  expect(assistant.content).toEqual([{ type: "text", text: "hi" }]);
});
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/agents/provider-capabilities.test.ts
Line: 94-103

Comment:
**No direct `shouldDropThinkingBlocksForModel` assertion for new providers**

The existing test only calls `shouldDropThinkingBlocksForModel` for `"github-copilot"`. Now that `"anthropic"` and `"amazon-bedrock"` both set `dropThinkingBlockModelHints: ["claude"]`, direct behavioral assertions would make the intent explicit and guard against future regressions:

```typescript
expect(
  shouldDropThinkingBlocksForModel({
    provider: "anthropic",
    modelId: "claude-opus-4-6",
  }),
).toBe(true);
expect(
  shouldDropThinkingBlocksForModel({
    provider: "amazon-bedrock",
    modelId: "anthropic.claude-3-5-sonnet-20241022-v2:0",
  }),
).toBe(true);
```

The indirect coverage via the `resolveProviderCapabilities` snapshot checks `dropThinkingBlockModelHints` is present, but a direct behavioral test is clearer.

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: d47d5ae

@frankekn frankekn merged commit 5ca0233 into main Mar 13, 2026
8 checks passed
@frankekn frankekn deleted the frank/44387-anthropic-thinking-replay-replacement branch March 13, 2026 08:57
@frankekn
Copy link
Contributor Author

Landed. Thanks @jmcte for the original implementation and fix direction.

  • Landed in: 5ca0233
  • Original author commit preserved in replacement flow: d997a3a

hougangdev pushed a commit to hougangdev/clawdbot that referenced this pull request Mar 14, 2026
* agents: drop Anthropic thinking blocks on replay

* fix: extend anthropic replay sanitization openclaw#44429 thanks @jmcte

* fix: extend anthropic replay sanitization openclaw#44843 thanks @jmcte

* test: add bedrock replay sanitization coverage openclaw#44843

* test: cover anthropic provider drop-thinking hints openclaw#44843

---------

Co-authored-by: johnmteneyckjr <[email protected]>
ecochran76 pushed a commit to ecochran76/openclaw that referenced this pull request Mar 14, 2026
* agents: drop Anthropic thinking blocks on replay

* fix: extend anthropic replay sanitization openclaw#44429 thanks @jmcte

* fix: extend anthropic replay sanitization openclaw#44843 thanks @jmcte

* test: add bedrock replay sanitization coverage openclaw#44843

* test: cover anthropic provider drop-thinking hints openclaw#44843

---------

Co-authored-by: johnmteneyckjr <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling maintainer Maintainer-authored PR size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: persisted WhatsApp Anthropic sessions can fail on replay due to thinking blocks

2 participants