diff --git a/ROADMAP.md b/ROADMAP.md index 50f418d..4b4d8b2 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -14725,3 +14725,108 @@ The deeper fix is to declare a typed wire-vocabulary boundary at the provider ed **Status:** Open. No code changed. Filed 2026-04-25 23:30 KST. Branch: feat/jobdori-168c-emission-routing. HEAD: ceb092a. Sibling-shape cluster (silent-fallback / silent-drop / silent-strip / silent-misnomer / silent-shadow / silent-prefix-mismatch / structural-absence / silent-zero-coercion / silent-content-discard / silent-header-discard / silent-tier-absence / silent-finish-mistranslation at provider/CLI boundary): #201/#202/#203/#206/#207/#208/#209/#210/#211/#212/#213/#214/#215/#216/#217 — sixteen pinpoints. Wire-format-parity cluster: #211 (max_completion_tokens) + #212 (parallel_tool_calls) + #213 (cached_tokens) + #214 (reasoning_content) + #215 (Retry-After) + #216 (service_tier + system_fingerprint) + #217 (finish_reason taxonomy) — seven pinpoints, every member is "claw and the wire format disagree on a documented field." Classifier-leakage shape: response-side string mistranslation that flows three layers deep into a runtime classifier that misclassifies provider failures as session successes, distinct from prior structural-absence members. External validation: OpenAI Chat Completions API reference (https://platform.openai.com/docs/api-reference/chat/object — `finish_reason` documented as one of `stop` / `length` / `tool_calls` / `content_filter` / `function_call`), Anthropic Messages API reference (https://docs.anthropic.com/en/api/messages — `stop_reason` documented as one of `end_turn` / `max_tokens` / `stop_sequence` / `tool_use` / `pause_turn`, plus `refusal` on 2025+ models), OpenAI deprecation notice for `function_call` (https://platform.openai.com/docs/api-reference/chat/create#chat-create-function_call — deprecated in favor of `tool_calls`/`tool_choice`, but still emitted as `finish_reason: "function_call"` by older deployments and several compat shims), Azure OpenAI Chat Completions reference (https://learn.microsoft.com/en-us/azure/ai-services/openai/reference — confirms `function_call` still emitted by deployment versions ≤ 2024-02-15-preview), DeepSeek API reference (https://api-docs.deepseek.com/api/create-chat-completion — emits all five OpenAI finish reasons), Moonshot kimi API reference (https://platform.moonshot.cn/docs/api/chat — emits `length` and `content_filter` with documented identical semantics to OpenAI), Alibaba DashScope API reference (https://help.aliyun.com/zh/model-studio/use-qwen-by-calling-api — emits `length` for max-token truncation), anomalyco/opencode#19842 (active issue tracking finish_reason='length' silently treated as success in worker classifier — exact same bug shape, same cluster, in a sibling project), charmbracelet/crush (handles `length`/`content_filter` distinctly via typed enum at the wire boundary), simonw/llm (typed Reason enum with `Stop`/`Length`/`ContentFilter`/`ToolCall` variants, exhaustive match at consumer), Vercel AI SDK `FinishReason` typed union with seven variants including `length` and `content-filter`, LangChain `BaseChatModel.generate` runs through `_create_chat_result` which preserves all five OpenAI finish_reasons and routes content_filter to a separate `LengthFinishReasonError` / `ContentFilterFinishReasonError` exception path, semantic-kernel `ChatCompletion.FinishReason` enum, OpenAI Python SDK `ChatCompletion.choices[0].finish_reason: Literal['stop','length','tool_calls','content_filter','function_call']` (typed at the SDK boundary), OpenTelemetry GenAI semantic conventions (https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/ — `gen_ai.response.finish_reasons` is a typed array attribute with the same five-value vocabulary, meaning every observability backend in the OpenAI ecosystem treats this as a structured enum) — claw is the sole client/agent/SDK in the surveyed ecosystem that drops three of five OpenAI finish reasons through a string fallthrough into a stringly-typed Rust field that is then read by a runtime classifier with two-literal-compare coverage. The fix shape is well-understood, the typed enum exists in every peer codebase, and the bug is a 4-line patch in the normalizer plus a 30-line refactor of the classifier — but it requires the typed-enum-at-the-wire-boundary architectural rule from the deeper-fix section to land cleanly, otherwise it is just another partial mapping bug waiting for the next OpenAI spec addition. 🪨 + +--- + +## Pinpoint #218 — Structured-Outputs Cluster: `response_format` / `output_config` / `refusal` Are Architecturally Absent + +**Filed:** 2026-04-26 00:09 KST (Jobdori cycle #370 / extends #168c emission-routing audit / wire-format-parity cluster grows to eight: #211+#212+#213+#214+#215+#216+#217+#218 / sibling-shape cluster grows to seventeen: #201/#202/#203/#206/#207/#208/#209/#210/#211/#212/#213/#214/#215/#216/#217/#218) +**Branch:** feat/jobdori-168c-emission-routing +**HEAD:** 91e2905 (#217) + +**Summary.** claw has zero support for the modern schema-constrained output surface that has become a baseline ecosystem feature in 2024–2025: OpenAI `response_format: {type: "json_schema", json_schema: {...}, strict: true}` (introduced 2024-08-06, GA on gpt-4o-mini-2024-07-18 / gpt-4o-2024-08-06 / gpt-5 / o1 / o3 / o4-mini), Anthropic `output_config.format: {type: "json_schema", schema: {...}}` (GA on 2025-11-13 with the `structured-outputs-2025-11-13` beta preceding it on claude-3-7-sonnet / claude-sonnet-4 / claude-opus-4 / claude-opus-4.6), and the response-side `message.refusal` channel that OpenAI emits when constrained decoding rejects a generation on safety grounds. None of these fields exist anywhere in the codebase. The gap is structural across four layers — (a) request struct has no field to write, (b) request builder has no branch to emit it, (c) response struct has no field to deserialize, (d) content-block taxonomy has no variant to surface refusals — making this the largest single-feature absence catalogued in the cluster to date and a hard parity floor against anomalyco/opencode#10456 (open feature request for the same capability in the sibling project, citing OpenAI Codex as the reference implementation). + +**Concrete locations and shape.** + +**(1) Request-side absence — write path is structurally closed.** `rust/crates/api/src/types.rs:6-36` defines `MessageRequest` with thirteen fields (model, max_tokens, messages, system, tools, tool_choice, stream, temperature, top_p, frequency_penalty, presence_penalty, stop, reasoning_effort). No `response_format`. No `output_config`. No `output_schema`. No `json_schema`. No `seed`. No `logprobs`. No `top_logprobs`. No `logit_bias`. No `n`. No `metadata`. `grep -rn "response_format\|json_schema\|output_config\|logprobs\|logit_bias" rust/crates/api/src/` returns zero hits. The struct itself is `#[derive(Debug, Clone, PartialEq, Serialize, Deserialize, Default)]` — caller code cannot opt into structured outputs even by setting the field, because the field does not exist; even monkey-patching at the JSON layer is impossible because the request envelope is constructed via this typed struct, never via raw JSON. `rust/crates/api/src/providers/openai_compat.rs:845-928` (`build_chat_completion_request`) writes thirteen optional fields onto the wire payload mirroring the struct one-to-one: `model`, `max_tokens` or `max_completion_tokens`, `messages`, `stream`, optional `stream_options`, `tools`, `tool_choice`, `temperature`, `top_p`, `frequency_penalty`, `presence_penalty`, `stop`, `reasoning_effort`. Zero hits for `response_format` in this builder. `rust/crates/api/src/providers/anthropic.rs` and `rust/crates/telemetry/src/lib.rs:107` (`AnthropicRequestProfile::render_json_body`) render the same `MessageRequest` struct via serde to wire JSON — zero hits for `output_config` either. The Anthropic-native render path has no `extra_body` escape hatch (asymmetric, same shape as #216 service_tier), so even raw-JSON injection is unavailable. claw is structurally incapable of: + +- enabling OpenAI strict-schema constrained decoding (developers.openai.com/api/docs/guides/structured-outputs — guarantees output adheres to user-supplied JSON Schema or surfaces a typed `refusal`); +- enabling Anthropic native structured outputs (docs.anthropic.com/en/api/structured-outputs — generally available since 2025-11-13, schema-conforming JSON guaranteed at decode time, eliminates retry loops); +- enabling JSON-mode-only compatibility for older models (`response_format: {type: "json_object"}` — the pre-2024-08 OpenAI shape, still required for gpt-4-turbo / gpt-3.5-turbo / DeepSeek deepseek-chat / Moonshot kimi-default); +- supplying a `seed` for reproducible sampling (OpenAI Chat Completions API parameter, documented since 2024-04, used by every reproducibility-pinning workflow in the ecosystem); +- requesting `logprobs`/`top_logprobs` (OpenAI/DeepSeek/Moonshot all expose this — used by every model evaluator, every uncertainty-quantification tool, every active-learning loop, every speculative-decoding shim); +- biasing token probabilities via `logit_bias` (OpenAI parameter, used by every bias-mitigation and forced-vocabulary tool in the ecosystem); +- requesting multiple completions via `n` (OpenAI parameter, used by every best-of-n / self-consistency / majority-vote sampling loop). + +**(2) Response-side absence — read path is structurally closed.** `rust/crates/api/src/providers/openai_compat.rs:672-688` defines `ChatCompletionResponse` with five fields (id, model, choices, usage) and `ChatMessage` with three fields (role, content, tool_calls). No `refusal` field. No `parsed` field (the OpenAI Python SDK populates this when structured outputs are used — claw cannot expose it because the deserialize path doesn't see it). `ChatCompletionChunk` (line 717) has the same shape gap. `ChunkDelta` (line 735) has two fields (content, tool_calls) — no `refusal` delta channel either, despite the streaming aggregator at line 1781 explicitly *writing a test that includes `"refusal": null` in the test payload* (the field is acknowledged at the test-data layer but never deserialized anywhere). The serde-deserialize layer drops `refusal` silently before any handler sees it. `rust/crates/api/src/types.rs:121-135` defines `MessageResponse` with no `parsed` field for structured-output payloads. `rust/crates/api/src/types.rs:147-167` defines `OutputContentBlock` with four variants (Text, ToolUse, Thinking, RedactedThinking) — no `Refusal` variant. The Anthropic-native path is symmetric: `MessageResponse.stop_reason: Option` has no slot for the new Anthropic `refusal` stop_reason value (Anthropic Messages API reference 2025-11+ documents `stop_reason: "refusal"` as a sixth canonical value when constrained decoding rejects a generation). When OpenAI gpt-5 generates a structured-outputs refusal, the wire shape is `{message: {role: "assistant", refusal: "I cannot help with that", content: null, tool_calls: null}, finish_reason: "stop"}` — claw deserializes this as a `ChatMessage` with role=assistant, content=None, tool_calls=[], drops the refusal string at the serde layer, returns a `MessageResponse` with empty content and `stop_reason: "end_turn"` (via #217's normalize_finish_reason mapping `stop → end_turn`), and the worker classifier at `worker_boot.rs:558` sees `finish='end_turn'`, `tokens_output=0`, fails the (unknown && zero-output) test, and emits `WorkerStatus::Finished` with `last_error: None`. Net effect: a model refusal becomes a silent successful empty assistant turn with no UX channel for the operator, no `WorkerEventKind::Refused`, no audit trail. + +**(3) Anthropic native path is also closed.** `rust/crates/api/src/providers/anthropic.rs:466` (`AnthropicClient::send_raw_request`) renders `MessageRequest` via `render_json_body` (`telemetry/lib.rs:107`) — same struct, same fields, no `output_config`. Anthropic GA structured outputs as of 2025-11-13 require a top-level `output_config: {format: {type: "json_schema", schema: {...}}}` request field. claw cannot opt in. Even the prior beta path (`anthropic-beta: structured-outputs-2025-11-13` header) is unreachable because the per-client header injection in `AnthropicClient::with_headers` does not include this beta flag and the constrained-decoding decoder on Anthropic's side rejects the request without it. Furthermore, the Anthropic-native response path (`OutputContentBlock`) has no variant for the structured-output result envelope (which Anthropic surfaces as a special content block with the canonical schema-conforming JSON and a separate validation status field). + +**(4) Cluster-shape kinship and novelty.** Same family as the wire-format-parity cluster (#211/#212/#213/#214/#215/#216/#217), but the failure mode is the largest single feature absence yet catalogued. Prior members were single-field gaps: max_completion_tokens key name (#211), parallel_tool_calls modifier (#212), cached_tokens deserialization (#213), reasoning_content delta (#214), Retry-After header (#215), service_tier + system_fingerprint pair (#216), finish_reason taxonomy (#217). #218 is **a four-layer feature absence**: a coherent capability (constrained decoding + refusal channel) that requires synchronized changes to (a) the request struct, (b) the request builder, (c) the response struct, (d) the content-block taxonomy. It is not a single missing field but a missing architectural seam: the boundary between "prose-mode generation" and "schema-constrained generation" exists in every peer codebase as a typed branch in the request-builder and a typed branch in the response-parser, and is entirely absent from claw. Distinct from #216's three-dimensional structural absence (request-side write + response-side read + reproducibility marker for a single feature) by adding a fourth dimension: content-block-taxonomy variant for the refusal channel. + +**Reproduction sketch:** + +```rust +// Test 1: MessageRequest cannot represent OpenAI structured outputs. +#[test] +fn message_request_lacks_response_format_field() { + let request = MessageRequest::default(); + let json = serde_json::to_value(&request).unwrap(); + // Observable: no response_format key in the serialized payload, and no + // typed field on the struct to populate. + assert!(json.get("response_format").is_none()); + // Compile-time observable: request.response_format does not compile. + // let _ = request.response_format; +} + +// Test 2: OpenAI refusal is silently dropped at deserialize. +#[test] +fn openai_compat_drops_refusal_field() { + let body = json!({ + "id": "chatcmpl-1", + "model": "gpt-5", + "choices": [{ + "message": { + "role": "assistant", + "content": null, + "refusal": "I cannot help with that request.", + "tool_calls": null + }, + "finish_reason": "stop" + }], + "usage": {"prompt_tokens": 12, "completion_tokens": 0} + }); + let parsed: ChatCompletionResponse = serde_json::from_value(body).unwrap(); + // Bug: refusal is unrecoverable from the parsed struct. + // The string was discarded at serde-deserialize time. + let message = &parsed.choices[0].message; + // No way to access refusal; the field does not exist on ChatMessage. + assert!(message.content.is_none()); + assert!(message.tool_calls.is_empty()); + // Operator has no signal that this was a refusal. +} + +// Test 3: end-to-end — structured-outputs refusal becomes a silent success. +#[tokio::test] +async fn refusal_classifies_as_finished_success() { + // A response with content=null and refusal="..." should classify as + // Failed/Refused, not as Finished/success. + let registry = WorkerRegistry::new(); + let id = registry.spawn("w1").unwrap().id; + // Simulate the value that flows through normalize_finish_reason today + // for an OpenAI refusal: finish='stop' (mapped from "stop"), tokens=0. + let worker = registry.observe_completion(&id, "end_turn", 0).unwrap(); + // currently: WorkerStatus::Finished with last_error=None — bug. + // expected: WorkerStatus::Failed with WorkerFailureKind::Refused and + // the refusal string surfaced as last_error.message. + assert_eq!(worker.status, WorkerStatus::Failed); +} + +// Test 4: Anthropic native path cannot opt into output_config. +#[test] +fn message_request_lacks_output_config_field() { + let request = MessageRequest::default(); + let json = serde_json::to_value(&request).unwrap(); + assert!(json.get("output_config").is_none()); +} +``` + +**Fix shape (not implemented in this cycle, recorded for cluster refactor):** + +The minimal fix is a layered seven-touch change. (a) Add `pub response_format: Option` to `MessageRequest` (types.rs:6) where `ResponseFormat` is a typed enum with three variants: `Text` (default, omit from wire), `JsonObject` (legacy JSON mode), `JsonSchema { schema: Value, strict: bool, name: String }` (modern strict-schema mode). (b) Add `pub output_config: Option` to `MessageRequest` for Anthropic GA structured outputs, with `OutputConfig::JsonSchema { schema: Value }` as the documented shape. (c) Add `pub seed: Option`, `pub logprobs: Option`, `pub top_logprobs: Option`, `pub logit_bias: Option>`, `pub n: Option`, `pub metadata: Option>` to `MessageRequest` for parameter parity. (d) Extend `build_chat_completion_request` (openai_compat.rs:845) to emit each on the wire when set, with provider-aware mapping (e.g., DashScope rejects `logit_bias`, kimi rejects `seed` on some endpoints — silent-strip with one-time tracing::warn instead of silent drop). (e) Add `refusal: Option` to `ChatMessage` (openai_compat.rs:688) and `ChunkDelta` (openai_compat.rs:735), bridging both into a new `OutputContentBlock::Refusal { text: String }` variant (types.rs:147) and a new `ContentBlockDelta::RefusalDelta { text: String }` variant in the streaming aggregator. (f) Add `StopReason::Refusal` to the typed enum from #217's deeper-fix section (or extend the existing `Option` taxonomy with a documented "refusal" string until that lands). (g) Extend `WorkerRegistry::observe_completion` (worker_boot.rs:558) to route `StopReason::Refusal` and `OutputContentBlock::Refusal` payloads into a new `WorkerFailureKind::Refused { refusal_text: String }` so operators can distinguish refusals from truncation from provider errors. Estimate: ~180 LOC production + ~280 LOC test (covering all five capabilities × OpenAI/Anthropic native × streaming/non-streaming × kimi/DeepSeek/DashScope variant rejections × refusal bridging end-to-end through worker classifier). + +The deeper fix is to declare a typed wire-vocabulary boundary at the provider edge for *both* request-side and response-side, with the wire-vocabulary-at-the-boundary rule from #217 extended to capabilities (response_format / output_config / structured-outputs is a capability, not just a vocabulary), and to introduce a `Capability` typed enum at the request layer that compiles to provider-appropriate wire fields (`response_format` for OpenAI-compat, `output_config` for Anthropic-native, `--output-schema` for the runner) via a single `into_provider_payload()` translation. This collapses #218 into one composable rule with the rest of the wire-format-parity cluster (#211/#212/#213/#214/#215/#216/#217) and gives claw the same capability surface as anomalyco/opencode (whose #10456 open-issue tracker is requesting exactly this), charmbracelet/crush (which has it), simonw/llm (which has it via `--schema`), Vercel AI SDK (which has `generateObject` with Zod schemas backed by both OpenAI strict-schema and Anthropic output_config), LangChain (which has `with_structured_output` backed by both providers natively), and OpenAI Codex SDK (which has `--output-schema` flag in the CLI runner). + +**Status:** Open. No code changed. Filed 2026-04-26 00:09 KST. Branch: feat/jobdori-168c-emission-routing. HEAD: 91e2905. Sibling-shape cluster (silent-fallback / silent-drop / silent-strip / silent-misnomer / silent-shadow / silent-prefix-mismatch / structural-absence / silent-zero-coercion / silent-content-discard / silent-header-discard / silent-tier-absence / silent-finish-mistranslation / silent-capability-absence at provider/CLI boundary): #201/#202/#203/#206/#207/#208/#209/#210/#211/#212/#213/#214/#215/#216/#217/#218 — seventeen pinpoints. Wire-format-parity cluster grows to eight: #211 (max_completion_tokens) + #212 (parallel_tool_calls) + #213 (cached_tokens) + #214 (reasoning_content) + #215 (Retry-After) + #216 (service_tier + system_fingerprint) + #217 (finish_reason taxonomy) + #218 (response_format / output_config / refusal). Four-layer-structural-absence shape: request-struct-field + request-builder-write + response-struct-field + content-block-taxonomy-variant, distinct from prior single-field (#211/#212/#214) / response-only (#213/#207) / header-only (#215) / three-dimensional (#216) / classifier-leakage (#217) members and the largest single-feature absence catalogued. External validation: OpenAI Structured Outputs guide (https://developers.openai.com/api/docs/guides/structured-outputs — `response_format: {type: "json_schema", json_schema: {schema, strict: true, name}}` GA since 2024-08-06, guarantees schema adherence via constrained decoding, refusal channel via `message.refusal: string | null`), OpenAI Chat Completions API reference (https://platform.openai.com/docs/api-reference/chat/create — documents response_format, seed, logprobs, top_logprobs, logit_bias, n, metadata as first-class request parameters), OpenAI Cookbook structured-outputs intro (https://developers.openai.com/cookbook/examples/structured_outputs_intro — canonical reference implementation), Anthropic Structured Outputs reference (https://docs.anthropic.com/en/api/structured-outputs — `output_config.format: {type: "json_schema", schema}` GA on 2025-11-13, guarantees schema-conforming JSON, eliminates retry loops), Anthropic Messages API reference (https://docs.anthropic.com/en/api/messages — `stop_reason: "refusal"` documented as sixth canonical value on 2025-11+ models when constrained decoding rejects), Vercel AI Gateway Anthropic structured outputs (https://vercel.com/docs/ai-gateway/sdks-and-apis/anthropic-messages-api/structured-outputs — production-grade output_config.format pass-through), Vercel AI SDK 6 generateObject (https://vercel.com/blog/ai-sdk-6 — Zod-schema → JSON Schema → output_config / response_format with type-safe end-to-end), LangChain BaseChatModel.with_structured_output (https://reference.langchain.com — backs json_schema / function_calling / json_mode steering modes uniformly across OpenAI, Anthropic, Ollama), simonw/llm `--schema` flag (typed Reason enum + structured-outputs first-class CLI argument), charmbracelet/crush typed structured-output handling (referenced in cluster pinpoints #211/#212/#214/#217 — same project handles this canonically), anomalyco/opencode#10456 (open feature request: "schema-constrained structured outputs (JSON Schema), similar to Codex" — exact same gap in sibling project, citing OpenAI Codex SDK as reference implementation, references the exact ecosystem expectation that schema-constrained outputs are a baseline 2025+ capability), anomalyco/opencode#5639 / #11357 / #13618 (related parity pinpoints in sibling project tracker), OpenAI Codex CI/code-review guide (https://cookbook.openai.com/examples/codex/build_code_review_with_codex_sdk — flagship use case for structured outputs, used to enable predictable CI/PR-review automation, the very use case for which a coding-agent CLI exists), OpenRouter structured-outputs documentation (https://openrouter.ai/docs/guides/features/structured-outputs — gateway-level pass-through of response_format across all OpenAI-compat providers), helicone.ai structured-outputs explainer (https://www.helicone.ai/blog/openai-structured-outputs — observability-platform documentation of the canonical request/response shape), microsoft devblogs (https://devblogs.microsoft.com/agent-framework/using-json-schema-for-structured-output-in-net-for-openai-models — semantic-kernel structured-output binding), OpenAI Python SDK `client.beta.chat.completions.parse(response_format=Pydantic)` (typed at the SDK boundary with first-class structured-output ergonomics), OpenTelemetry GenAI semconv (https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/ — `gen_ai.request.response_format` and `gen_ai.response.refusal` are documented attributes for spans, meaning every observability backend in the OpenAI ecosystem treats both as structured signals) — claw is the sole client/agent/SDK in the surveyed ecosystem with zero support for schema-constrained structured outputs, no `response_format`, no `output_config`, no `refusal` channel, and no `--output-schema` CLI affordance. The fix shape is well-understood, the typed structures exist in every peer codebase, the open feature request in anomalyco/opencode is the most-upvoted parity gap, and #218 is the largest single deliverable inside the wire-format-parity cluster — closing it requires the typed-enum-at-the-wire-boundary architectural rule from #217's deeper-fix section *plus* a Capability typed-enum extension layer to span request/response symmetrically. + +🪨