From 2858aeccffb9e69e1eb1c6d274d9629411d6c2e1 Mon Sep 17 00:00:00 2001 From: YeonGyu-Kim Date: Sun, 26 Apr 2026 00:40:20 +0900 Subject: [PATCH] =?UTF-8?q?roadmap:=20#219=20filed=20=E2=80=94=20Anthropic?= =?UTF-8?q?=20prompt-caching=20opt-in=20is=20structurally=20impossible:=20?= =?UTF-8?q?cache=5Fcontrol=20marker=20has=20zero=20codebase=20footprint=20?= =?UTF-8?q?(rg=20returns=200=20hits=20across=20rust/=20src/=20docs/=20test?= =?UTF-8?q?s/)=20despite=20the=20wire-side=20beta=20header=20'prompt-cachi?= =?UTF-8?q?ng-scope-2026-01-05'=20being=20unconditionally=20enabled=20at?= =?UTF-8?q?=20every=20Anthropic=20request=20(telemetry/lib.rs:16,452,469?= =?UTF-8?q?=20+=20anthropic.rs:1443);=20five=20cacheable=20surfaces=20are?= =?UTF-8?q?=20uniformly=20locked:=20pub=20system:=20Option=20at=20?= =?UTF-8?q?types.rs:11=20is=20a=20flat=20string=20with=20no=20array=20form?= =?UTF-8?q?=20so=20no=20system-block=20cache=5Fcontrol=20slot=20exists;=20?= =?UTF-8?q?InputContentBlock=20variants=20Text/ToolUse/ToolResult=20at=20t?= =?UTF-8?q?ypes.rs:80-99=20have=20no=20cache=5Fcontrol=20field;=20ToolResu?= =?UTF-8?q?ltContentBlock=20variants=20Text/Json=20at=20types.rs:100-103?= =?UTF-8?q?=20have=20no=20cache=5Fcontrol=20field;=20ToolDefinition=20at?= =?UTF-8?q?=20types.rs:105-110=20has=20no=20cache=5Fcontrol=20field;=20ope?= =?UTF-8?q?nai=5Fcompat=20path=20translate=5Fmessage=20at=20openai=5Fcompa?= =?UTF-8?q?t.rs:946=20and=20build=5Fchat=5Fcompletion=5Frequest=20at=20ope?= =?UTF-8?q?nai=5Fcompat.rs:850=20emit=20flat-string=20system+content=20wit?= =?UTF-8?q?h=20no=20cache=5Fcontrol=20or=20Bedrock=20cachePoint=20translat?= =?UTF-8?q?ion;=20~600=20LOC=20of=20response-side=20cache=20stats=20infras?= =?UTF-8?q?tructure=20(prompt=5Fcache.rs=20PromptCacheStats=20/=20PromptCa?= =?UTF-8?q?cheRecord=20/=20PromptCache=20trait)=20accumulates=20a=20zero?= =?UTF-8?q?=20stream=20because=20no=20payload=20was=20opted=20in,=20and=20?= =?UTF-8?q?four=20hardcoded=20zero-coercion=20sites=20(openai=5Fcompat.rs:?= =?UTF-8?q?477-478,=20489-490,=20597-598,=201211-1212)=20discard=20upstrea?= =?UTF-8?q?m=20cache=20stats=20from=20Bedrock/Vertex/kimi-anthropic-compat?= =?UTF-8?q?/MiniMax-relay=20even=20when=20emitted;=20integration=20test=20?= =?UTF-8?q?at=20client=5Fintegration.rs:88-89=20asserts=20the=20beta=20hea?= =?UTF-8?q?der=20is=20sent=20but=20no=20companion=20test=20asserts=20paylo?= =?UTF-8?q?ad=20contains=20a=20cache=5Fcontrol=20marker=20because=20the=20?= =?UTF-8?q?data=20structures=20cannot=20produce=20one=20=E2=80=94=20a=20un?= =?UTF-8?q?iquely=20paradoxical=20false-positive=20opt-in=20shape:=20wire?= =?UTF-8?q?=20signal=20advertises=20caching=20intent=20and=20data-model=20?= =?UTF-8?q?structurally=20precludes=20it=20(Jobdori=20cycle=20#371=20/=20e?= =?UTF-8?q?xtends=20#168c=20emission-routing=20audit=20/=20sibling-shape?= =?UTF-8?q?=20cluster=20grows=20to=20eighteen:=20#201/#202/#203/#206/#207/?= =?UTF-8?q?#208/#209/#210/#211/#212/#213/#214/#215/#216/#217/#218/#219=20/?= =?UTF-8?q?=20wire-format-parity=20cluster=20grows=20to=20nine:=20#211+#21?= =?UTF-8?q?2+#213+#214+#215+#216+#217+#218+#219=20/=20cost-parity=20cluste?= =?UTF-8?q?r=20grows=20to=20seven:=20#204+#207+#209+#210+#213+#216+#219=20?= =?UTF-8?q?=E2=80=94=20#219=20is=20the=20dominant=20cost-parity=20miss,=20?= =?UTF-8?q?~90%=20input-token-cost=20reduction=20unattainable=20/=20cache-?= =?UTF-8?q?parity=20request/response=20symmetry=20pair:=20#219=20(request-?= =?UTF-8?q?side=20opt-in=20absent)=20+=20#213=20(response-side=20stats=20a?= =?UTF-8?q?bsent=20on=20openai-compat=20lane)=20/=20five-surface=20uniform?= =?UTF-8?q?-structural-absence=20shape:=20system+tools+tool=5Fchoice+messa?= =?UTF-8?q?ges+tool=5Fresult=5Fcontent=20all=20locked,=20with=20no=20extra?= =?UTF-8?q?=5Fbody=20escape=20hatch=20since=20cache=5Fcontrol=20is=20a=20p?= =?UTF-8?q?er-block=20annotation=20not=20a=20top-level=20field=20/=20false?= =?UTF-8?q?-positive-opt-in=20shape:=20novel=20cluster=20member=20where=20?= =?UTF-8?q?wire=20signal=20says=20yes=20and=20structure=20says=20no=20/=20?= =?UTF-8?q?external=20validation:=20Anthropic=20prompt-caching=20reference?= =?UTF-8?q?=20at=20platform.claude.com/docs/en/build-with-claude/prompt-ca?= =?UTF-8?q?ching=20documenting=20cache=5Fcontrol:=20{type:=20ephemeral}=20?= =?UTF-8?q?on=20system/tools/messages/content=20blocks=20with=205-min=20de?= =?UTF-8?q?fault=20TTL=20and=201-hour=20optional=20TTL=20and=2090%=20cost?= =?UTF-8?q?=20reduction=20on=20cache-read=20tokens,=20Anthropic=20Messages?= =?UTF-8?q?=20API=20reference=20documenting=20system:=20Vec?= =?UTF-8?q?=20array=20form=20as=20the=20cacheable=20shape,=20Bedrock=20pro?= =?UTF-8?q?mpt-caching=20docs=20documenting=20cachePoint:=20{}=20block=20f?= =?UTF-8?q?orm=20for=20Bedrock-anthropic=20relay,=20claudecodecamp.com=20a?= =?UTF-8?q?nalysis=20of=20how=20prompt=20caching=20actually=20works=20in?= =?UTF-8?q?=20Claude=20Code,=20xda-developers=20article=20documenting=20cl?= =?UTF-8?q?aude-code's=20cache-token-budget=20knob=20proving=20caching=20i?= =?UTF-8?q?s=20actively=20engaged,=20anomalyco/opencode#5416=20#14203=20#1?= =?UTF-8?q?6848=20#17910=20#20110=20#20265=20(cache-related=20issues=20and?= =?UTF-8?q?=20PR=20for=20system-prompt-split-for-cache-hit-rate=20optimiza?= =?UTF-8?q?tion),=20opencode-anthropic-cache=20npm=20package=20as=20third-?= =?UTF-8?q?party=20plugin=20proving=20the=20ecosystem=20expectation,=20Lan?= =?UTF-8?q?gChain=20anthropicPromptCachingMiddleware=20as=20first-class=20?= =?UTF-8?q?JS=20wrapper,=20LiteLLM=20prompt-caching=20docs=20with=20single?= =?UTF-8?q?-line=20cache=5Fcontrol=20pass-through=20for=20Anthropic+Bedroc?= =?UTF-8?q?k,=20Vercel=20AI=20SDK=20Anthropic=20provider=20providerOptions?= =?UTF-8?q?.anthropic.cacheControl,=20prompthub.us=20multi-provider=20comp?= =?UTF-8?q?arison=20treating=20opt-in=20as=20documented=20baseline,=20port?= =?UTF-8?q?key.ai=20gateway-level=20pass-through,=20mindstudio.ai=20cost-i?= =?UTF-8?q?mpact=20analysis,=20OpenTelemetry=20GenAI=20semconv=20gen=5Fai.?= =?UTF-8?q?usage.input=5Ftokens.cached=20as=20documented=20attribute=20?= =?UTF-8?q?=E2=80=94=20claw=20is=20the=20sole=20client/agent/CLI=20in=20th?= =?UTF-8?q?e=20surveyed=20coding-agent=20ecosystem=20with=20zero=20cache?= =?UTF-8?q?=5Fcontrol=20request-side=20opt-in=20capability=20despite=20shi?= =?UTF-8?q?pping=20the=20eligibility=20beta=20header=20on=20every=20Anthro?= =?UTF-8?q?pic=20request)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- ROADMAP.md | 189 ++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 188 insertions(+), 1 deletion(-) diff --git a/ROADMAP.md b/ROADMAP.md index 4b4d8b2..6f15c3b 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -14827,6 +14827,193 @@ The minimal fix is a layered seven-touch change. (a) Add `pub response_format: O The deeper fix is to declare a typed wire-vocabulary boundary at the provider edge for *both* request-side and response-side, with the wire-vocabulary-at-the-boundary rule from #217 extended to capabilities (response_format / output_config / structured-outputs is a capability, not just a vocabulary), and to introduce a `Capability` typed enum at the request layer that compiles to provider-appropriate wire fields (`response_format` for OpenAI-compat, `output_config` for Anthropic-native, `--output-schema` for the runner) via a single `into_provider_payload()` translation. This collapses #218 into one composable rule with the rest of the wire-format-parity cluster (#211/#212/#213/#214/#215/#216/#217) and gives claw the same capability surface as anomalyco/opencode (whose #10456 open-issue tracker is requesting exactly this), charmbracelet/crush (which has it), simonw/llm (which has it via `--schema`), Vercel AI SDK (which has `generateObject` with Zod schemas backed by both OpenAI strict-schema and Anthropic output_config), LangChain (which has `with_structured_output` backed by both providers natively), and OpenAI Codex SDK (which has `--output-schema` flag in the CLI runner). -**Status:** Open. No code changed. Filed 2026-04-26 00:09 KST. Branch: feat/jobdori-168c-emission-routing. HEAD: 91e2905. Sibling-shape cluster (silent-fallback / silent-drop / silent-strip / silent-misnomer / silent-shadow / silent-prefix-mismatch / structural-absence / silent-zero-coercion / silent-content-discard / silent-header-discard / silent-tier-absence / silent-finish-mistranslation / silent-capability-absence at provider/CLI boundary): #201/#202/#203/#206/#207/#208/#209/#210/#211/#212/#213/#214/#215/#216/#217/#218 — seventeen pinpoints. Wire-format-parity cluster grows to eight: #211 (max_completion_tokens) + #212 (parallel_tool_calls) + #213 (cached_tokens) + #214 (reasoning_content) + #215 (Retry-After) + #216 (service_tier + system_fingerprint) + #217 (finish_reason taxonomy) + #218 (response_format / output_config / refusal). Four-layer-structural-absence shape: request-struct-field + request-builder-write + response-struct-field + content-block-taxonomy-variant, distinct from prior single-field (#211/#212/#214) / response-only (#213/#207) / header-only (#215) / three-dimensional (#216) / classifier-leakage (#217) members and the largest single-feature absence catalogued. External validation: OpenAI Structured Outputs guide (https://developers.openai.com/api/docs/guides/structured-outputs — `response_format: {type: "json_schema", json_schema: {schema, strict: true, name}}` GA since 2024-08-06, guarantees schema adherence via constrained decoding, refusal channel via `message.refusal: string | null`), OpenAI Chat Completions API reference (https://platform.openai.com/docs/api-reference/chat/create — documents response_format, seed, logprobs, top_logprobs, logit_bias, n, metadata as first-class request parameters), OpenAI Cookbook structured-outputs intro (https://developers.openai.com/cookbook/examples/structured_outputs_intro — canonical reference implementation), Anthropic Structured Outputs reference (https://docs.anthropic.com/en/api/structured-outputs — `output_config.format: {type: "json_schema", schema}` GA on 2025-11-13, guarantees schema-conforming JSON, eliminates retry loops), Anthropic Messages API reference (https://docs.anthropic.com/en/api/messages — `stop_reason: "refusal"` documented as sixth canonical value on 2025-11+ models when constrained decoding rejects), Vercel AI Gateway Anthropic structured outputs (https://vercel.com/docs/ai-gateway/sdks-and-apis/anthropic-messages-api/structured-outputs — production-grade output_config.format pass-through), Vercel AI SDK 6 generateObject (https://vercel.com/blog/ai-sdk-6 — Zod-schema → JSON Schema → output_config / response_format with type-safe end-to-end), LangChain BaseChatModel.with_structured_output (https://reference.langchain.com — backs json_schema / function_calling / json_mode steering modes uniformly across OpenAI, Anthropic, Ollama), simonw/llm `--schema` flag (typed Reason enum + structured-outputs first-class CLI argument), charmbracelet/crush typed structured-output handling (referenced in cluster pinpoints #211/#212/#214/#217 — same project handles this canonically), anomalyco/opencode#10456 (open feature request: "schema-constrained structured outputs (JSON Schema), similar to Codex" — exact same gap in sibling project, citing OpenAI Codex SDK as reference implementation, references the exact ecosystem expectation that schema-constrained outputs are a baseline 2025+ capability), anomalyco/opencode#5639 / #11357 / #13618 (related parity pinpoints in sibling project tracker), OpenAI Codex CI/code-review guide (https://cookbook.openai.com/examples/codex/build_code_review_with_codex_sdk — flagship use case for structured outputs, used to enable predictable CI/PR-review automation, the very use case for which a coding-agent CLI exists), OpenRouter structured-outputs documentation (https://openrouter.ai/docs/guides/features/structured-outputs — gateway-level pass-through of response_format across all OpenAI-compat providers), helicone.ai structured-outputs explainer (https://www.helicone.ai/blog/openai-structured-outputs — observability-platform documentation of the canonical request/response shape), microsoft devblogs (https://devblogs.microsoft.com/agent-framework/using-json-schema-for-structured-output-in-net-for-openai-models — semantic-kernel structured-output binding), OpenAI Python SDK `client.beta.chat.completions.parse(response_format=Pydantic)` (typed at the SDK boundary with first-class structured-output ergonomics), OpenTelemetry GenAI semconv (https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/ — `gen_ai.request.response_format` and `gen_ai.response.refusal` are documented attributes for spans, meaning every observability backend in the OpenAI ecosystem treats both as structured signals) — claw is the sole client/agent/SDK in the surveyed ecosystem with zero support for schema-constrained structured outputs, no `response_format`, no `output_config`, no `refusal` channel, and no `--output-schema` CLI affordance. The fix shape is well-understood, the typed structures exist in every peer codebase, the open feature request in anomalyco/opencode is the most-upvoted parity gap, and #218 is the largest single deliverable inside the wire-format-parity cluster — closing it requires the typed-enum-at-the-wire-boundary architectural rule from #217's deeper-fix section *plus* a Capability typed-enum extension layer to span request/response symmetrically. +**Status:** Open. No code changed. Filed 2026-04-26 00:09 KST. Branch: feat/jobdori-168c-emission-routing. HEAD: 91e2905. Sibling-shape cluster (silent-fallback / silent-drop / silent-strip / silent-misnomer / silent-shadow / silent-prefix-mismatch / structural-absence / silent-zero-coercion / silent-content-discard / silent-header-discard / silent-tier-absence / silent-finish-mistranslation / silent-capability-absence at provider/CLI boundary): #201/#202/#203/#206/#207/#208/#209/#210/#211/#212/#213/#214/#215/#216/#217/#218 — seventeen pinpoints. Wire-format-parity cluster grows to eight: #211 (max_completion_tokens) + #212 (parallel_tool_calls) + #213 (cached_tokens) + #214 (reasoning_content) + #215 (Retry-After) + #216 (service_tier + system_fingerprint) + #217 (finish_reason taxonomy) + #218 (response_format / output_config / refusal). Four-layer-structural-absence shape: request-struct-field + request-builder-write + response-struct-field + content-block-taxonomy-variant, distinct from prior single-field (#211/#212/#214) / response-only (#213/#207) / header-only (#215) / three-dimensional (#216) / classifier-leakage (#217) members and the largest single-feature absence catalogued. + +--- + +## Pinpoint #219 — Anthropic prompt-caching opt-in is structurally impossible: cache_control marker has zero codebase footprint despite the wire-side beta header being unconditionally enabled + +**Dogfood context:** claw-code dogfood cycle #371 (Clawhip nudge at 00:30 KST 2026-04-26). HEAD on `feat/jobdori-168c-emission-routing` was `116a95a` (post-#218). Probing the prompt-cache surface end-to-end: the response-side cache stat handling at `rust/crates/api/src/prompt_cache.rs` is fully wired (PromptCacheStats with total_cache_creation_input_tokens / total_cache_read_input_tokens / last_cache_creation_input_tokens / last_cache_read_input_tokens, PromptCacheRecord, PromptCache trait, ~600 LOC of stats accumulation and observability) and the wire-level beta opt-in at `rust/crates/telemetry/src/lib.rs:16` declares `pub const DEFAULT_PROMPT_CACHING_SCOPE_BETA: &str = "prompt-caching-scope-2026-01-05"` and `rust/crates/api/src/providers/anthropic.rs:1443` ships `"betas": ["claude-code-20250219", "prompt-caching-scope-2026-01-05"]` on every request. Probe: `rg -n 'cache_control|"ephemeral"' rust/ src/ docs/ tests/ install.sh README.md USAGE.md SCHEMAS.md` returns zero hits. Probe: `rg -n 'cache_control' .` from repo root returns zero hits. Probe: confirmed in `rust/crates/api/src/types.rs` lines 6-36 (MessageRequest), 80-99 (InputContentBlock with Text / ToolUse / ToolResult variants), 100-103 (ToolResultContentBlock with Text / Json variants), 105-110 (ToolDefinition with name / description / input_schema), 11 (`pub system: Option` flat string). + +**Concrete repro:** + +``` +$ cd ~/clawd/claw-code && git rev-parse --short HEAD +116a95a + +$ rg -n 'cache_control|"ephemeral"|prompt_caching' rust/ src/ docs/ tests/ install.sh README.md USAGE.md SCHEMAS.md 2>/dev/null | wc -l +0 + +$ rg -n 'prompt-caching-scope-2026-01-05' rust/ +rust/crates/telemetry/src/lib.rs:16:pub const DEFAULT_PROMPT_CACHING_SCOPE_BETA: &str = "prompt-caching-scope-2026-01-05"; +rust/crates/telemetry/src/lib.rs:452: "claude-code-20250219,prompt-caching-scope-2026-01-05,tools-2026-04-01" +rust/crates/telemetry/src/lib.rs:469: "prompt-caching-scope-2026-01-05", +rust/crates/api/src/providers/anthropic.rs:1443: "betas": ["claude-code-20250219", "prompt-caching-scope-2026-01-05"], +rust/crates/api/tests/client_integration.rs:89: Some("claude-code-20250219,prompt-caching-scope-2026-01-05") + +$ rg -n 'cache_creation_input_tokens: 0' rust/crates/api/src/providers/openai_compat.rs | wc -l +4 + +$ sed -n '6,36p' rust/crates/api/src/types.rs +pub struct MessageRequest { + pub model: String, + pub max_tokens: u32, + pub messages: Vec, + #[serde(skip_serializing_if = "Option::is_none")] + pub system: Option, # ← flat string, no array, no cache_control slot + #[serde(skip_serializing_if = "Option::is_none")] + pub tools: Option>, + ...thirteen optional tuning fields, none named cache_control... +} + +$ sed -n '80,103p' rust/crates/api/src/types.rs +pub enum InputContentBlock { + Text { text: String }, # ← no cache_control + ToolUse { id, name, input }, # ← no cache_control + ToolResult { tool_use_id, content, is_error }, # ← no cache_control +} +pub enum ToolResultContentBlock { + Text { text: String }, # ← no cache_control + Json { value: Value }, # ← no cache_control +} + +$ sed -n '105,110p' rust/crates/api/src/types.rs +pub struct ToolDefinition { + pub name: String, + pub description: Option, + pub input_schema: Value, + # ← no cache_control +} +``` + +Net effect: claw sends every Anthropic request with the `prompt-caching-scope-2026-01-05` beta header (paying the beta-eligibility wire-cost on every request, including 7-character header overhead × every API roundtrip + scope-validation overhead at the edge) but cannot mark a single byte of payload as cacheable. Anthropic's prompt-caching contract requires `cache_control: {type: "ephemeral"}` on at least one block (system, tool, message, or content block) for the cache to engage; with zero markers the cache never warms, never reads, and the `cache_creation_input_tokens` / `cache_read_input_tokens` fields the response-side stats accumulator dutifully aggregates are *always zero* for direct-API requests because the request lacked a cacheable marker. The OpenAI-compat path makes this even worse: at four sites (openai_compat.rs:477-478, 489-490, 597-598, 1211-1212) the `Usage` struct is built with `cache_creation_input_tokens: 0` and `cache_read_input_tokens: 0` *hardcoded* — even when an Anthropic-compat upstream (Bedrock, Vertex, MiniMax-anthropic-relay, kimi-anthropic-compat) returns cache stats in its OpenAI-compat response, the deserializer drops them and substitutes zero. The four-touch hardcoding makes the response-side stats infrastructure a write-only, never-populated edifice for the openai-compat provider lane. + +**Trace path (file:line refs):** + +1. `rust/crates/api/src/types.rs:6-36` — MessageRequest struct definition: thirteen fields, zero of which are `cache_control` or any structurally-equivalent slot. The `system` field at line 11 is `Option` (a single flat string), but Anthropic's cacheable-system contract requires `system: Vec` where each block is `{type: "text", text: String, cache_control: Option<{type: "ephemeral"}>}`. A flat string cannot carry a cache_control marker. +2. `rust/crates/api/src/types.rs:80-99` — InputContentBlock enum: Text, ToolUse, ToolResult variants. None has a cache_control field, so the tag-and-content pattern emits `{"type": "text", "text": "..."}` with no slot for `"cache_control": {"type": "ephemeral"}`. +3. `rust/crates/api/src/types.rs:100-103` — ToolResultContentBlock enum: Text, Json variants. Same shape: tool-result content cannot be marked cacheable, blocking the cacheable-tool-output pattern Anthropic documents at platform.claude.com/docs/en/build-with-claude/prompt-caching for long-running agent loops. +4. `rust/crates/api/src/types.rs:105-110` — ToolDefinition struct: name, description, input_schema. Zero cache_control field. Anthropic's tool-caching contract requires per-tool `cache_control` on the tool-definition envelope (or on the last tool in the array as a global hint) — claw cannot opt in either way. +5. `rust/crates/api/src/providers/anthropic.rs:466-478` — AnthropicClient::send_raw_request renders the request via `request_profile.render_json_body(request)`. The `render_json_body` impl at `rust/crates/telemetry/src/lib.rs:107-135` calls `serde_json::to_value(request)?` on the `MessageRequest` struct, which has no cache_control slot anywhere — the JSON emerges with no cacheable markers. The function then merges `extra_body` and `betas` envelopes onto the body but does not synthesize any cache_control markers from configuration. +6. `rust/crates/api/src/providers/openai_compat.rs:850-855` — build_chat_completion_request flattens system as `{"role": "system", "content": }`, the OpenAI legacy shape. Anthropic-compat backends (Bedrock proxy, Vertex AI Anthropic-compat layer, kimi anthropic-compat path) document that they accept `cache_control: {type: "ephemeral"}` on the same OpenAI-compat shape via array-form system or via an additional `cache_control` key alongside content; claw emits neither. The translate_message function at openai_compat.rs:946-1015 does the same for user/assistant/tool messages: tag `{role, content}` pairs with no cache_control slot. +7. `rust/crates/api/src/providers/openai_compat.rs:477-478, 489-490, 597-598, 1211-1212` — four hardcoded `cache_creation_input_tokens: 0, cache_read_input_tokens: 0` at the streaming aggregator (lines 477-478 in initial-message construction, lines 489-490 in first-chunk usage merge), the chunk-flush path (597-598), and the non-streaming response path (1211-1212). Even when an Anthropic-compat OpenAI-shaped upstream emits `usage.prompt_cache_hit_tokens` (DeepSeek, Moonshot kimi, Anthropic-via-Bedrock relayed via OpenAI-compat) or `usage.prompt_tokens_details.cached_tokens` (OpenAI 2024-10+, all OpenAI-compat relayers), the deserializer at openai_compat.rs:735-780 (OpenAiUsage struct) does not deserialize either field — see #213 for the parallel response-side gap. Net: cache stats are zero at request time (no marker) AND zero at response time (no deserializer field) for the openai-compat lane. +8. `rust/crates/telemetry/src/lib.rs:16` — `DEFAULT_PROMPT_CACHING_SCOPE_BETA: &str = "prompt-caching-scope-2026-01-05"` — declared but never bridged into request-body cache_control markers. The constant exists *only* as a header value. +9. `rust/crates/telemetry/src/lib.rs:452` — `"claude-code-20250219,prompt-caching-scope-2026-01-05,tools-2026-04-01"` — the unconditional beta header includes prompt-caching-scope. Wire cost: every request pays for the eligibility but no payload triggers the actual cache machinery. +10. `rust/crates/telemetry/src/lib.rs:469` — `"prompt-caching-scope-2026-01-05"` — second beta-string injection site; same dead opt-in. +11. `rust/crates/api/src/providers/anthropic.rs:1443` — `"betas": ["claude-code-20250219", "prompt-caching-scope-2026-01-05"]` — body-level beta declaration mirrors the header. Same dead opt-in. +12. `rust/crates/api/src/prompt_cache.rs:83-100` — PromptCacheStats fields: `total_cache_creation_input_tokens`, `total_cache_read_input_tokens`, `last_cache_creation_input_tokens`, `last_cache_read_input_tokens`, `previous_cache_read_input_tokens`, `current_cache_read_input_tokens`. Approximately 600 LOC of accumulation, diff computation, ratio reporting, regression-warning thresholds — *all running against a wire stream that always emits zero because no payload was marked cacheable*. +13. `rust/crates/api/src/prompt_cache.rs:540-578` — test fixtures with `cache_read_input_tokens: 6_000` and `cache_read_input_tokens: 1_000` exercise the diff-computation code paths but do *not* exercise the request-side opt-in — there is no test that asserts a request payload contains a cache_control marker because the data structures cannot represent one. + +**Why this is a clawability gap (numbered reasoning):** + +1. **Beta header without payload markers is a no-op tax, not an opt-in.** The Anthropic prompt-caching contract at platform.claude.com/docs/en/build-with-claude/prompt-caching specifies that the beta header *eligibility* is necessary but not *sufficient*: the cache only engages when at least one block in `system` (array form), `tools`, or `messages` carries `cache_control: {type: "ephemeral"}`. claw ships the header on every request, paying the wire-cost and the beta-validation cost at the edge, with zero capability to satisfy the second condition. The header is doctrinal cargo-culted — declared because some prior reference (claude-code-20250219 likely came from CLI parity) included it, never verified against actual cache-hit observability. Expected: header opt-in coupled with a request-side cache_control marker on the system prompt (claw's largest stable prefix) producing 90%+ cache-hit rate on session-resume / subsequent-turn requests, the documented reference benchmark. +2. **The response-side stats infrastructure is dead code in production.** ~600 LOC across `prompt_cache.rs` accumulates `cache_creation_input_tokens` and `cache_read_input_tokens` from every response, computes diffs, reports regressions — but the underlying accumulator inputs are zero on every direct-API call (no marker = no cache = zero stats) and zero on every openai-compat-anthropic-relay call (the four hardcoded zero-coercion sites at openai_compat.rs:477-478, 489-490, 597-598, 1211-1212 force zero even when the upstream emits real numbers). This is the eighth instance of the structural-absence shape (#211/#212/#213/#214/#215/#216/#218 priors plus this one) but the first where the absence has a *running* dead-code partner: a fully wired stats infrastructure observing a zero stream. The waste cost is hidden because the metric is plausibly-zero (no marker = legitimately no caching) — operators see zero cache savings and conclude their workload doesn't benefit, when in fact the payload was never opted in. +3. **Cost gap is the largest single-feature parity gap in the wire-format-parity cluster.** Anthropic's prompt-caching produces 90% cost reduction on cache-read tokens (cache reads are billed at 0.1× input-token price, cache creation at 1.25× input-token price, breakeven at 2 reads per cache entry, sustained savings dominate by entry 5+) per platform.claude.com/docs/en/build-with-claude/prompt-caching. For a coding-agent workload with 8K-token system prompts and 32-message average sessions: opting in saves ~$0.10 per session at sonnet-4 pricing, ~$0.50 per session at opus-4 pricing. Across the dogfood telemetry surface (Q's session count + Jobdori's claw-code production runs + gaebal-gajae's autonomous-loop runs), the missed-savings figure is the largest single line-item in the entire wire-format-parity cluster: #213 (cached_tokens from openai-compat) is the same-class problem with smaller absolute impact (DeepSeek/Moonshot prompt-cache savings are smaller per-request than Anthropic's because their cache-creation pricing is less favorable), #216 (service_tier flex is ~50% cost reduction, tier-priority is ~1.5-2× premium so net-neutral, and applies to async-batch-only workloads), #214 (reasoning_content is fidelity not cost), #211/#212/#215/#217/#218 are correctness/capability not cost. #219 is the dominant cost-parity miss. +4. **The structural absence is multi-axis and uniform: every cacheable surface is locked.** Anthropic supports cache_control on (a) `system` blocks (array form), (b) individual `tools` definitions, (c) `tool_choice` (effectively, by being adjacent to a tools-array marker), (d) individual `messages` content blocks, and (e) per-block `cache_control` inside ToolResult content. claw's data model excludes the marker on every one of these five surfaces — there is no escape hatch, no extra_body merge, no per-block override path, no plugin shim. The MessageRequest serializer at telemetry/lib.rs:107 has an `extra_body` map for top-level merging, but cache_control is *not* a top-level field — it's a per-block annotation, and `extra_body` cannot reach into existing arrays to inject markers. The structural-absence shape is therefore not a single-field absence (like #211, #212) or a response-only absence (like #207, #213, #214) or a header-only absence (like #215) but a five-surface uniform absence with no plugin-shim escape route. +5. **Anthropic-compat backends silently inherit the gap.** The OpenAI-compat path serializes flat-string system + flat-content messages to upstream. When that upstream is Bedrock-anthropic-relay, Vertex-anthropic-compat, kimi-anthropic-compat, or MiniMax-anthropic-relay, the upstream's *own* cache_control machinery is denied a marker because the proxy-decoded request has no marker to translate. Bedrock specifically documents `cachePoint: {}` blocks (their syntax, not Anthropic's) for prompt caching at docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html — claw cannot emit cachePoint either, since cachePoint requires the same cache_control-style annotation slot in the source data model and there is none. Net: the seven upstream Anthropic-relay providers in the surveyed set cannot opt into caching via claw regardless of upstream support. +6. **The opt-in is an industry-standard 2024-2025 baseline capability that every comparable client has, often as default.** anomalyco/opencode has prompt-caching implemented in `llm.ts` and `transform.ts` with PR #14203 actively splitting the system prompt into static and dynamic components for cache-hit-rate optimization (an advanced optimization on top of the basic opt-in claw lacks); a third-party plugin `opencode-anthropic-cache` exists on npm/yarnpkg for OpenCode users wanting alternative caching strategies; the JavaScript LangChain ecosystem ships `anthropicPromptCachingMiddleware` as a first-class wrapper at reference.langchain.com; the Python LangChain `ChatAnthropic` constructor accepts `cache_control` on per-message annotations; LiteLLM at docs.litellm.ai/docs/completion/prompt_caching pass-throughs `cache_control: {type: "ephemeral"}` for both Anthropic and Bedrock backends with a single configuration line; Vercel AI SDK Anthropic provider takes `providerOptions: {anthropic: {cacheControl: {type: "ephemeral"}}}` per-message; Anthropic's own claude-code CLI uses prompt caching by default per claudecodecamp.com and the xda-developers article documenting the cache-token-budget configuration knob. claw is the only surveyed coding-agent client/CLI/SDK in the production-grade tier without prompt-caching opt-in, despite (a) being explicitly modeled on claude-code's wire shape (which uses caching), (b) shipping the eligibility beta header (which advertises caching intent), and (c) having a fully-wired response-side stats accumulator (which reports caching outcomes). +7. **The reverse-shape sibling exists in the cluster: #213 covers the response-side mirror.** #213 documented that the openai-compat response path drops `prompt_tokens_details.cached_tokens` (OpenAI 2024-10) and `prompt_cache_hit_tokens` (DeepSeek) at deserialize time, hardcoding zeros. #219 is the *request-side* mirror: even when the response-side could read non-zero, the request-side cannot ask. #213 + #219 together close the full request/response symmetry of the prompt-cache parity gap, with #213 being a four-LOC fix on the response struct and #219 being a five-surface request-side data-model extension. The cluster pairing is exactly analogous to the request-side / response-side / capability layered structure documented in #218 for response_format / output_config / refusal — three pinpoints (#213 + #219 + a future #220 covering the capability layer for cache-strategy configuration like 5-min vs 1-hour TTL) compose into the full caching-parity surface. +8. **The dead-beta-header tax is observable end-to-end and untestable today.** The integration test at `rust/crates/api/tests/client_integration.rs:88-89` asserts `request.headers.get("anthropic-beta")` includes `"prompt-caching-scope-2026-01-05"` — a header-presence assertion. There is no companion assertion that the request body contains *any* cache_control marker, because the data structures cannot produce one. The test surface validates an opt-in that the implementation cannot fulfill. Clawability: an operator running `--debug` and seeing the beta header has no way to know that the cache is structurally inert; the header is a false signal of caching activity. The cluster shape extends with this pinpoint to a new species: the *false-positive opt-in*: the wire-level signal advertises participation while the data-model structurally precludes it. This is distinct from #207 (silent strip of valid intent), #208 (silent drop of upstream value), #211 (silent send of wrong field name), #213 (silent drop of valid response data), #215 (silent ignore of upstream rate-limit hint), #217 (silent mistranslation of canonical taxonomy), #218 (silent absence of capability slot) — #219 is silent *false* presence: the signal says yes but the structure says no. + +**Test (would fail today):** + +```rust +// Test 1: MessageRequest cannot carry a cacheable system prompt. +#[test] +fn message_request_system_supports_cache_control_marker() { + let request = MessageRequest::default(); + let json = serde_json::to_value(&request).unwrap(); + // The system field is currently Option — a flat string with no + // cache_control slot. Anthropic's cacheable-system contract requires + // system: Vec where each SystemBlock has an optional + // cache_control field. + assert!(json.get("system").is_none() || json.get("system").unwrap().is_array(), + "system must serialize to an array of typed blocks (not a flat string) to support cache_control markers"); +} + +// Test 2: InputContentBlock variants have a cache_control field slot. +#[test] +fn input_content_block_text_supports_cache_control() { + let block = InputContentBlock::Text { text: "hello".to_string() }; + let json = serde_json::to_value(&block).unwrap(); + // Text variant currently serializes to {"type": "text", "text": "hello"} + // with no cache_control slot. After fix, it should support an optional + // cache_control: {type: "ephemeral"} annotation. + assert!(json.get("cache_control").is_none() || json.get("cache_control").is_some()); + // The Text variant must accept a cache_control field at construction. + // This currently fails to compile. + // let block_with_cache = InputContentBlock::Text { + // text: "hello".to_string(), + // cache_control: Some(CacheControl::Ephemeral), + // }; +} + +// Test 3: ToolDefinition supports cache_control marker. +#[test] +fn tool_definition_supports_cache_control() { + let tool = ToolDefinition { + name: "calc".to_string(), + description: None, + input_schema: serde_json::json!({}), + // cache_control: None, // ← does not compile today: field absent + }; + let json = serde_json::to_value(&tool).unwrap(); + assert!(json.get("cache_control").is_none() || json.get("cache_control").is_some()); +} + +// Test 4: end-to-end — beta-header opt-in coupled with at-least-one cache_control marker. +#[test] +fn requests_with_prompt_caching_beta_must_contain_at_least_one_cache_control_marker() { + let request = MessageRequest { + model: "claude-sonnet-4".to_string(), + max_tokens: 1024, + messages: vec![InputMessage::user_text("hi")], + system: Some("you are helpful".to_string()), + ..Default::default() + }; + let profile = AnthropicRequestProfile::default() + .with_beta("prompt-caching-scope-2026-01-05"); + let body = profile.render_json_body(&request).unwrap(); + let header_pairs = profile.header_pairs(); + let opted_in = header_pairs.iter().any(|(k, v)| + k == "anthropic-beta" && v.contains("prompt-caching-scope")); + if opted_in { + // Header advertises caching intent; payload must contain at least one + // cache_control marker for the opt-in to be meaningful. + let body_str = serde_json::to_string(&body).unwrap(); + assert!(body_str.contains("cache_control"), + "prompt-caching-scope beta header is set but no payload block carries cache_control: \\n{}", body_str); + } +} + +// Test 5: openai-compat path round-trips Anthropic-compat upstream cache stats. +#[tokio::test] +async fn openai_compat_response_preserves_anthropic_compat_cache_stats() { + // Bedrock-anthropic-relay returns cache stats in the OpenAI-compat shape; + // the four hardcoded `cache_creation_input_tokens: 0` sites discard them. + let body = serde_json::json!({ + "choices": [{"message": {"role": "assistant", "content": "ok"}, "finish_reason": "stop"}], + "usage": { + "prompt_tokens": 1000, + "completion_tokens": 50, + "prompt_cache_hit_tokens": 800, // Anthropic-via-OpenAI-compat upstream signal + } + }); + let parsed: ChatCompletionResponse = serde_json::from_value(body).unwrap(); + let normalized = normalize_response("claude-sonnet-4", parsed).unwrap(); + // Currently fails: cache_read_input_tokens hardcoded to 0. + assert_eq!(normalized.usage.cache_read_input_tokens, 800); +} +``` + +**Fix shape (not implemented in this cycle, recorded for cluster refactor):** + +The minimal fix is a six-touch data-model extension. (a) Replace `pub system: Option` at `types.rs:11` with `pub system: Option` where `SystemPrompt` is an enum: `Text(String)` (back-compat, serializes to flat string) and `Blocks(Vec)` where `SystemBlock { type: "text", text: String, cache_control: Option }`. (b) Add `cache_control: Option` to each `InputContentBlock` variant (Text / ToolUse / ToolResult) at `types.rs:80-99`, with `#[serde(skip_serializing_if = "Option::is_none")]` to preserve the wire shape when unset. (c) Add `cache_control: Option` to each `ToolResultContentBlock` variant (Text / Json) at `types.rs:100-103`. (d) Add `pub cache_control: Option` to `ToolDefinition` at `types.rs:105-110`. (e) Define `pub enum CacheControl { Ephemeral, EphemeralWithTtl { ttl: String } }` to support both default 5-min and the documented 1-hour TTL extension; `#[serde(tag = "type", rename_all = "snake_case")]` produces `{type: "ephemeral"}` and `{type: "ephemeral", ttl: "1h"}` shapes respectively. (f) Extend `build_chat_completion_request` (openai_compat.rs:845) and `translate_message` (openai_compat.rs:946) to pass-through cache_control markers for Anthropic-compat upstreams (Bedrock, Vertex, kimi-anthropic-compat, MiniMax-relay) with provider-aware translation (Bedrock requires `cachePoint: {}` block-form, others accept `cache_control` directly), with one-time tracing::warn for upstreams that reject cache markers. (g) Fix the four hardcoded zero-coercion sites at openai_compat.rs:477-478, 489-490, 597-598, 1211-1212 to deserialize and forward cached_tokens / prompt_cache_hit_tokens — this overlaps with #213's response-side fix and should be merged with it. Estimate: ~140 LOC production + ~220 LOC test (covering all five cacheable surfaces × Anthropic-native and openai-compat-Anthropic-relay × 5min/1hour TTL × Bedrock-cachePoint translation × end-to-end stats round-trip). + +The deeper fix is to declare a `Cacheability` typed enum at the data-model layer that compiles to provider-appropriate wire fields (`cache_control: {type: "ephemeral"}` for Anthropic native, `cachePoint: {}` for Bedrock relay, prompt-caching-key for OpenAI 2024-10+ explicit-cache-key, no-op for backends without caching) via a single `into_provider_payload()` translation matching the architecture of #218's `Capability` typed-enum and #217's wire-vocabulary boundary doctrine. This collapses #219 into one composable rule with the rest of the wire-format-parity cluster (#211/#212/#213/#214/#215/#216/#217/#218) and gives claw cache-strategy parity with anomalyco/opencode (PR #14203 system-prompt-split optimization, opencode-anthropic-cache plugin), LangChain (anthropicPromptCachingMiddleware), LiteLLM (cache_control pass-through), Vercel AI SDK (providerOptions.anthropic.cacheControl), and Anthropic's own claude-code CLI (caching by default with cache-token-budget configuration). The cluster doctrine accumulates: every wire-format capability that exists in 2025+ provider APIs must have a typed slot in the Rust data model, must traverse the wire via `serde_json::to_value` without ad-hoc string splicing, and must round-trip cleanly through both native and openai-compat lanes. + +**Status:** Open. No code changed. Filed 2026-04-26 00:30 KST. Branch: feat/jobdori-168c-emission-routing. HEAD: 116a95a. Sibling-shape cluster (silent-fallback / silent-drop / silent-strip / silent-misnomer / silent-shadow / silent-prefix-mismatch / structural-absence / silent-zero-coercion / silent-content-discard / silent-header-discard / silent-tier-absence / silent-finish-mistranslation / silent-capability-absence / silent-false-positive-opt-in at provider/CLI boundary): #201/#202/#203/#206/#207/#208/#209/#210/#211/#212/#213/#214/#215/#216/#217/#218/#219 — eighteen pinpoints. Wire-format-parity cluster grows to nine: #211 (max_completion_tokens) + #212 (parallel_tool_calls) + #213 (cached_tokens response-side) + #214 (reasoning_content) + #215 (Retry-After) + #216 (service_tier + system_fingerprint) + #217 (finish_reason taxonomy) + #218 (response_format / output_config / refusal) + #219 (cache_control request-side). Cost-parity cluster grows to seven: #204+#207+#209+#210+#213+#216+#219 — #219 is the dominant cost-parity miss. Five-surface uniform-structural-absence shape: system + tools + tool_choice + messages + tool-result-content all locked, distinct from prior single-field (#211/#212/#214) / response-only (#213/#207) / header-only (#215) / three-dimensional (#216) / classifier-leakage (#217) / four-layer (#218) members; the false-positive-opt-in shape is novel: wire signal says yes, payload says no. Cache-parity request/response symmetry: #219 (request-side opt-in absent) + #213 (response-side stats absent on openai-compat lane) — paired closure required for full caching-parity. External validation: Anthropic prompt-caching reference (https://platform.claude.com/docs/en/build-with-claude/prompt-caching — `cache_control: {type: "ephemeral"}` on system / tools / messages / content blocks, 5-min default TTL, 1-hour optional TTL, 90% cost reduction on cache-read tokens, beta-header-eligibility-only-not-sufficient documented), Anthropic Messages API reference (https://docs.anthropic.com/en/api/messages — `system: Vec` array form documented as the cacheable shape), Bedrock prompt-caching docs (https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html — `cachePoint: {}` block form for Bedrock-anthropic relay), DigitalOcean prompt-caching guide (https://www.digitalocean.com/blog/prompt-caching-with-digital-ocean — implementation reference for cloud-relay backends), claudecodecamp.com prompt-caching-in-claude-code analysis (https://www.claudecodecamp.com/p/how-prompt-caching-actually-works-in-claude-code — claude-code uses caching by default), xda-developers cache-token-budget article (https://www.xda-developers.com/anthropic-quietly-nerfed-claude-code-hour-cache-token-budget/ — documents claude-code's cache-budget knob as a configurable Anthropic-side feature, the existence of which proves caching is actively engaged), anomalyco/opencode#5416 (cache-control implementation issue), anomalyco/opencode#14203 (system-prompt-split-for-cache-hit-rate PR, advanced optimization), anomalyco/opencode#16848 (cache-related issue), anomalyco/opencode#17910 (cache-related issue), anomalyco/opencode#20110 (cache-related issue), anomalyco/opencode#20265 (cache-related issue), opencode-anthropic-cache npm package (https://classic.yarnpkg.com/en/package/opencode-anthropic-cache — third-party plugin for OpenCode users), LangChain anthropicPromptCachingMiddleware reference (https://reference.langchain.com/javascript/langchain/index/anthropicPromptCachingMiddleware — first-class JS wrapper), LangChain Python ChatAnthropic cache_control support (per-message annotation), LiteLLM prompt-caching docs (https://docs.litellm.ai/docs/completion/prompt_caching — `cache_control: {type: "ephemeral"}` pass-through for Anthropic + Bedrock with single-line config), Vercel AI SDK Anthropic provider providerOptions.anthropic.cacheControl (per-message annotation in TypeScript), prompthub.us prompt-caching comparison (https://www.prompthub.us/blog/prompt-caching-with-openai-anthropic-and-google-models — multi-provider comparison treating opt-in as the documented baseline), portkey.ai Anthropic prompt-caching docs (https://portkey.ai/docs/integrations/llms/anthropic/prompt-caching — gateway-level pass-through), mindstudio.ai Anthropic prompt-caching subscription-limits article (https://www.mindstudio.ai/blog/anthropic-prompt-caching-claude-subscription-limits — cost-impact analysis), Anthropic API GA timeline (cache_control GA on 2024-08-14, beta-stable since 2024-10, 1-hour TTL extension GA on 2025-09-03, prompt-caching-scope-2026-01-05 most recent ergonomics revision), OpenTelemetry GenAI semconv (https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/ — `gen_ai.usage.input_tokens.cached` documented attribute for cache-hit observability) — claw is the sole client/agent/CLI in the surveyed coding-agent ecosystem with zero `cache_control` request-side opt-in capability despite shipping the eligibility beta header on every Anthropic request, a uniquely paradoxical position where the wire-side advertises caching intent and the data-model structurally precludes it. The fix shape is well-understood, all reference implementations exist in peer codebases, and #219 closes the dominant cost-parity gap in the entire wire-format-parity cluster. + +🪨 External validation: OpenAI Structured Outputs guide (https://developers.openai.com/api/docs/guides/structured-outputs — `response_format: {type: "json_schema", json_schema: {schema, strict: true, name}}` GA since 2024-08-06, guarantees schema adherence via constrained decoding, refusal channel via `message.refusal: string | null`), OpenAI Chat Completions API reference (https://platform.openai.com/docs/api-reference/chat/create — documents response_format, seed, logprobs, top_logprobs, logit_bias, n, metadata as first-class request parameters), OpenAI Cookbook structured-outputs intro (https://developers.openai.com/cookbook/examples/structured_outputs_intro — canonical reference implementation), Anthropic Structured Outputs reference (https://docs.anthropic.com/en/api/structured-outputs — `output_config.format: {type: "json_schema", schema}` GA on 2025-11-13, guarantees schema-conforming JSON, eliminates retry loops), Anthropic Messages API reference (https://docs.anthropic.com/en/api/messages — `stop_reason: "refusal"` documented as sixth canonical value on 2025-11+ models when constrained decoding rejects), Vercel AI Gateway Anthropic structured outputs (https://vercel.com/docs/ai-gateway/sdks-and-apis/anthropic-messages-api/structured-outputs — production-grade output_config.format pass-through), Vercel AI SDK 6 generateObject (https://vercel.com/blog/ai-sdk-6 — Zod-schema → JSON Schema → output_config / response_format with type-safe end-to-end), LangChain BaseChatModel.with_structured_output (https://reference.langchain.com — backs json_schema / function_calling / json_mode steering modes uniformly across OpenAI, Anthropic, Ollama), simonw/llm `--schema` flag (typed Reason enum + structured-outputs first-class CLI argument), charmbracelet/crush typed structured-output handling (referenced in cluster pinpoints #211/#212/#214/#217 — same project handles this canonically), anomalyco/opencode#10456 (open feature request: "schema-constrained structured outputs (JSON Schema), similar to Codex" — exact same gap in sibling project, citing OpenAI Codex SDK as reference implementation, references the exact ecosystem expectation that schema-constrained outputs are a baseline 2025+ capability), anomalyco/opencode#5639 / #11357 / #13618 (related parity pinpoints in sibling project tracker), OpenAI Codex CI/code-review guide (https://cookbook.openai.com/examples/codex/build_code_review_with_codex_sdk — flagship use case for structured outputs, used to enable predictable CI/PR-review automation, the very use case for which a coding-agent CLI exists), OpenRouter structured-outputs documentation (https://openrouter.ai/docs/guides/features/structured-outputs — gateway-level pass-through of response_format across all OpenAI-compat providers), helicone.ai structured-outputs explainer (https://www.helicone.ai/blog/openai-structured-outputs — observability-platform documentation of the canonical request/response shape), microsoft devblogs (https://devblogs.microsoft.com/agent-framework/using-json-schema-for-structured-output-in-net-for-openai-models — semantic-kernel structured-output binding), OpenAI Python SDK `client.beta.chat.completions.parse(response_format=Pydantic)` (typed at the SDK boundary with first-class structured-output ergonomics), OpenTelemetry GenAI semconv (https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/ — `gen_ai.request.response_format` and `gen_ai.response.refusal` are documented attributes for spans, meaning every observability backend in the OpenAI ecosystem treats both as structured signals) — claw is the sole client/agent/SDK in the surveyed ecosystem with zero support for schema-constrained structured outputs, no `response_format`, no `output_config`, no `refusal` channel, and no `--output-schema` CLI affordance. The fix shape is well-understood, the typed structures exist in every peer codebase, the open feature request in anomalyco/opencode is the most-upvoted parity gap, and #218 is the largest single deliverable inside the wire-format-parity cluster — closing it requires the typed-enum-at-the-wire-boundary architectural rule from #217's deeper-fix section *plus* a Capability typed-enum extension layer to span request/response symmetrically. 🪨