From 959bdf849151d8fd4dca7d6d798d23d7c0f05a04 Mon Sep 17 00:00:00 2001 From: YeonGyu-Kim Date: Sat, 25 Apr 2026 22:16:02 +0900 Subject: [PATCH] =?UTF-8?q?roadmap:=20#214=20filed=20=E2=80=94=20ChunkDelt?= =?UTF-8?q?a=20and=20ChatMessage=20in=20openai=5Fcompat.rs=20deserialize?= =?UTF-8?q?=20only=20content/tool=5Fcalls;=20delta.reasoning=5Fcontent=20(?= =?UTF-8?q?sibling=20to=20delta.content,=20the=20canonical=20wire=20field?= =?UTF-8?q?=20for=20DeepSeek=20deepseek-reasoner=20/=20Alibaba=20Qwen3-Thi?= =?UTF-8?q?nking=20/=20QwQ=20/=20vLLM=20reasoning-parser=20backends)=20is?= =?UTF-8?q?=20silently=20discarded=20at=20serde-deserialize=20time=20befor?= =?UTF-8?q?e=20any=20handler=20sees=20it;=20non-streaming=20ChatMessage=20?= =?UTF-8?q?has=20the=20same=20gap;=20is=5Freasoning=5Fmodel=20classifier?= =?UTF-8?q?=20already=20returns=20true=20for=20o1/o3/o4/grok-3-mini/qwen-q?= =?UTF-8?q?wq/qwq/*thinking*=20and=20is=20consulted=20at=20line=20901=20to?= =?UTF-8?q?=20strip=20request-side=20tuning=20params=20but=20never=20on=20?= =?UTF-8?q?the=20response=20side=20to=20opt=20into=20reasoning=5Fcontent?= =?UTF-8?q?=20extraction;=20local=20taxonomy=20already=20declares=20Output?= =?UTF-8?q?ContentBlock::Thinking=20and=20ContentBlockDelta::ThinkingDelta?= =?UTF-8?q?=20and=20the=20Anthropic=20native=20path=20correctly=20emits=20?= =?UTF-8?q?both=20with=20full=20test=20coverage=20at=20sse.rs:260,288=20?= =?UTF-8?q?=E2=80=94=20the=20openai-compat=20translator=20has=20the=20dest?= =?UTF-8?q?ination=20types=20one=20import=20away=20and=20never=20bridges?= =?UTF-8?q?=20to=20them=20(Jobdori=20cycle=20#366=20/=20extends=20#168c=20?= =?UTF-8?q?emission-routing=20audit=20/=20sibling-shape=20cluster=20grows?= =?UTF-8?q?=20to=20thirteen:=20#201/#202/#203/#206/#207/#208/#209/#210/#21?= =?UTF-8?q?1/#212/#213/#214=20/=20reasoning-fidelity=20trio:=20#207+#211+#?= =?UTF-8?q?214=20/=20wire-format-parity=20cluster:=20#211+#212+#213+#214?= =?UTF-8?q?=20/=20external=20validation:=20DeepSeek=20API=20docs,=20vLLM?= =?UTF-8?q?=20reasoning-outputs,=20anomalyco/opencode#24124,=20charmbracel?= =?UTF-8?q?et/crush,=20simonw/llm,=20Vercel=20AI=20SDK,=20LangChain=20Base?= =?UTF-8?q?ChatOpenAI,=20LiteLLM,=20continue.dev#9245)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- ROADMAP.md | 259 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 259 insertions(+) diff --git a/ROADMAP.md b/ROADMAP.md index 16040f0..79ea122 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -13960,3 +13960,262 @@ The deeper fix is a unified-registry refactor (`MODEL_PARAM_REQUIREMENTS` table **Status:** Open. No code changed. Filed 2026-04-25 21:30 KST. Branch: feat/jobdori-168c-emission-routing. HEAD: c009818. Sibling-shape cluster (silent-fallback / silent-drop / silent-strip / silent-misnomer / silent-shadow / silent-prefix-mismatch / structural-absence / silent-zero-coercion at provider/CLI boundary): #201/#202/#203/#206/#207/#208/#209/#210/#211/#212/#213 — twelve pinpoints, one unified-registry refactor (`MODEL_PARAM_REQUIREMENTS` with five columns: `tuning_params_strip`, `max_output_tokens`, `max_tokens_param_name`, `default_parallel_tool_calls`, `cache_token_wire_shape`) closes them all. Cost-parity cluster: #204 (token emission) + #207 (token preservation) + #209 (cost estimation) + #210 (max_tokens registry parity) + #213 (cache token visibility) — five pinpoints, all openai-compat boundary. Wire-format-parity cluster: #211 (max_tokens parameter name) + #212 (parallel_tool_calls / disable_parallel_tool_use) + #213 (cached_tokens / prompt_cache_hit_tokens). External validation: OpenAI prompt caching docs (https://platform.openai.com/docs/guides/prompt-caching), DeepSeek pricing docs (https://api-docs.deepseek.com/quick_start/pricing), anomalyco/opencode#17223/#17121/#17056/#11995 (active issues on identical pattern), Vercel AI SDK `LanguageModelV1Usage.cachedInputTokens`, charmbracelet/crush usage telemetry, simonw/llm `--show-cached-tokens`, ddhigh.com (2026-03-26) third-party proxy fix with 87% per-request cost reduction — same control surface available across the entire ecosystem, absent only in claw-code. 🪨 + +## Pinpoint #214 — `ChunkDelta` deserializes only `content` and `tool_calls`; the openai-compat streaming path drops `delta.reasoning_content` entirely, silently discarding chain-of-thought text from DeepSeek `deepseek-reasoner`, Alibaba Qwen3-Thinking, QwQ, and any vLLM-served reasoning backend even though `is_reasoning_model()` already returns true for those families and the local `OutputContentBlock::Thinking`/`ContentBlockDelta::ThinkingDelta` taxonomy fully exists for the Anthropic native path (Jobdori, cycle #366 / extends #168c emission-routing audit / sibling-shape cluster grows to thirteen / completes the openai-compat reasoning-fidelity trio with #211 + #207) + +**Observed:** In `rust/crates/api/src/providers/openai_compat.rs:735-741`, the streaming-chunk delta deserialization struct captures exactly two fields: + +```rust +#[derive(Debug, Default, Deserialize)] +struct ChunkDelta { + #[serde(default)] + content: Option, + #[serde(default, deserialize_with = "deserialize_null_as_empty_vec")] + tool_calls: Vec, +} +``` + +There is no `reasoning_content`, no `reasoning`, no `thinking`, no `chain_of_thought` field, no fallback accessor, no `serde(flatten)` capture into a side-channel `extra: HashMap`. The wire field that DeepSeek's reasoning-model API places at `choices[].delta.reasoning_content` (sibling to `content`, not nested) and that vLLM emits at the same path for any reasoning-tuned backend is silently dropped at serde-deserialize time, before any handler sees it. + +The non-streaming response shape has the same gap. `ChatMessage` (lines 686-693) deserializes only `role`, `content`, and `tool_calls`: + +```rust +#[derive(Debug, Deserialize)] +struct ChatMessage { + role: String, + #[serde(default)] + content: Option, + #[serde(default)] + tool_calls: Vec, +} +``` + +No `reasoning_content` here either. A non-streaming `claw prompt --no-stream --model deepseek/deepseek-reasoner "explain X"` returns `MessageResponse.content` with the final answer only — the entire CoT is invisible. + +**Repository surface (verified 2026-04-25 21:55 KST):** + +```bash +$ cd ~/clawd/claw-code +$ grep -rn "reasoning_content\|reasoning:" rust/ src/ tests/ docs/ 2>/dev/null +# (empty — zero hits anywhere in the codebase) + +$ grep -rn "completion_tokens_details\|reasoning_tokens" rust/crates/api/src/ 2>/dev/null +# (empty) + +$ grep -n "Thinking\|ThinkingDelta" rust/crates/api/src/providers/openai_compat.rs +# rust/crates/api/src/providers/openai_compat.rs:790: // Alibaba DashScope reasoning variants (QwQ + Qwen3-Thinking family) +# (one comment line; zero code paths) + +$ grep -n "ContentBlock.*Thinking\|OutputContentBlock::Thinking\|ThinkingDelta" rust/crates/api/src/ +# rust/crates/api/src/types.rs:156: Thinking { +# rust/crates/api/src/types.rs:162: RedactedThinking { +# rust/crates/api/src/types.rs:245: ThinkingDelta { thinking: String }, +# rust/crates/api/src/sse.rs:260: content_block: OutputContentBlock::Thinking { +# rust/crates/api/src/sse.rs:288: delta: ContentBlockDelta::ThinkingDelta { + +# Result: the local taxonomy has Thinking content blocks and ThinkingDelta variants, +# the Anthropic SSE parser at sse.rs:260/288 emits both with test coverage, +# and the openai-compat path has neither a reader for the wire field nor an emitter +# for the local event variant. The lane is half-built: declared in types.rs, +# emitted by anthropic.rs, and structurally absent from openai_compat.rs. + +$ grep -n "is_reasoning_model" rust/crates/api/src/providers/openai_compat.rs +# rust/crates/api/src/providers/openai_compat.rs:780:pub fn is_reasoning_model(model: &str) -> bool { +# rust/crates/api/src/providers/openai_compat.rs:901: if !is_reasoning_model(&request.model) { + +# is_reasoning_model already classifies o1/o3/o4/grok-3-mini/qwen-qwq/qwq/*thinking* +# at line 780. The classifier is used at line 901 to strip tuning params for the +# request side — but the same classifier is never consulted on the response side +# to opt into reasoning_content extraction. Half-applied taxonomy, same shape as #211. +``` + +**Blast radius (verified by `grep -rn "OpenAiCompatClient\|openai_compat::" rust/crates/`):** + +- Every `claw prompt` against any of these wire model ids streams reasoning content into `/dev/null`: + - DeepSeek: `deepseek-reasoner`, `deepseek-chat` with `thinking=true`, `deepseek-v3.2-pro` thinking mode + - Alibaba DashScope: `qwen-qwq-32b`, `qwq-plus`, `qwen3-30b-a3b-thinking`, `qwen3-72b-thinking`, `qwen3-235b-a22b-thinking` + - vLLM-hosted: any model started with `--enable-reasoning --reasoning-parser deepseek_r1` + - SiliconFlow / OpenRouter / Together-passed reasoning models that follow the DeepSeek wire convention + - Future OpenAI o-series if/when OpenAI surfaces CoT through `delta.reasoning_content` (the public Responses API already exposes `reasoning.summary`; the path is the same shape) + +- Every claw consuming the SSE stream sees an empty `content_block_delta` window for the reasoning portion. A long DeepSeek-reasoner answer with 5 minutes of CoT and a 100-token final answer streams as: `MessageStart → ContentBlockStart(Text) → ContentBlockDelta(TextDelta {final answer text}) → ContentBlockStop → MessageDelta`. No `Thinking` block, no `ThinkingDelta`, no signal that the model spent five minutes reasoning. The output ledger shows `output_tokens` matching the final-answer length, even though billed `completion_tokens` from the upstream is 50× larger because reasoning tokens are billed too (the disconnect that #207 catches at the counter layer; #214 is the same disconnect at the content layer). + +- Hooks that would render a reasoning panel (sibling tool surface to claw-code's existing `--show-thinking` UX for Anthropic extended thinking) cannot fire on OpenAI-compat sessions. The TUI has no source for the data. There is no event for it. + +- Multi-turn conversations against `deepseek-reasoner` are correctly broken on the input side — DeepSeek docs explicitly require dropping `reasoning_content` from history to avoid 400 errors (https://api-docs.deepseek.com/guides/reasoning_model) — but claw-code never had it in the first place, so the input-side compliance is accidental. **However:** newer reasoning models (DeepSeek V4-Pro thinking mode, future OpenAI Responses API turns) require the *opposite* — `reasoning_content` MUST be passed back across turns when a tool was called in the previous turn. Without a parser claw-code structurally cannot comply with the newer contract; multi-turn tool-call sessions against V4-Pro will return 400 with no path to remediation short of upstream-version locking. + +- The Anthropic native path has full content-side parity. `sse.rs:260,288` parses `content_block_start` with `type: "thinking"` and `content_block_delta` with `delta.type: "thinking_delta"` into `OutputContentBlock::Thinking { thinking, signature }` and `ContentBlockDelta::ThinkingDelta { thinking }`. Tests at `sse.rs:243-296` assert both directions. The capability exists end-to-end on Anthropic. The OpenAI-compat translator has the destination types one import away and never bridges to them. + +**Gap:** + +1. **One canonical wire shape, three upstream spellings, zero claw-code reader.** DeepSeek emits `choices[0].delta.reasoning_content: "step text"` (sibling to `content`, since R1 release Jan 2026 and continuing through V3.2/V4-Pro). vLLM with `deepseek_r1` parser emits the same. SiliconFlow follows DeepSeek. OpenRouter wraps both. Some downstream proxies (LiteLLM, Helicone, Portkey) re-shape this into `delta.reasoning: { summary: "..."}` or `delta.thinking: "..."` to align with OpenAI's draft Responses API extension. Three wire spellings, no claw-code reader for any of them. Adding a top-level `Option reasoning_content` to `ChunkDelta` covers the first two; absorbing the third requires `serde(alias = "reasoning")` plus an enum discriminator. Same one-shape-per-provider asymmetry as #213 (cached_tokens vs prompt_cache_hit_tokens) and #211 (max_tokens vs max_completion_tokens vs max_output_tokens). + +2. **Half-applied taxonomy within a single 30-line span.** `is_reasoning_model(model)` at line 780 already returns `true` for o1/o3/o4/grok-3-mini/qwen-qwq*/qwq*/*thinking*. At line 901, `build_chat_completion_request` consults this classifier to strip tuning params on the *request* side. The mirror call site for the *response* side — "if `is_reasoning_model(model)`, parse `delta.reasoning_content` into `ContentBlockDelta::ThinkingDelta`" — does not exist. The taxonomy knows which models reason; the deserializer was never wired up to act on that knowledge. Same shape as #211 (gpt-5-prefix gate at request side, no o-series gate at response side). + +3. **Local taxonomy already declares the destination type.** `rust/crates/api/src/types.rs:156-162` defines `OutputContentBlock::Thinking { thinking: String, signature: Option }` and `OutputContentBlock::RedactedThinking { data: Value }`. Line 245 declares `ContentBlockDelta::ThinkingDelta { thinking: String }`. Both variants are emitted by `rust/crates/api/src/providers/anthropic.rs` via `sse.rs` parsing. The type slot exists, the emitter exists for one provider, the second provider has neither emitter nor reader. Adding the missing emitter is mechanical; the gap is structural absence, not design ambiguity. + +4. **Zero event for the dropped data.** No `reasoning_content_dropped` event. No `unsupported_chunk_field` event. No log line. No telemetry counter. A claw inspecting the SSE stream cannot tell whether the upstream model produced no reasoning, produced reasoning that was dropped, or is a non-reasoning model. Same opacity-pattern as #201 (silent tool-arg fallback), #202 (silent tool-message drop), #203 (no AutoCompactionEvent), #207 (silent zero-fill on usage), #208 (silent param strip), #211 (silent prefix-mismatch), #212 (no parallel-tool-emission event), #213 (silent zero-coercion on cache tokens). #214 extends the cluster to thirteen with the same shape: provider-side fact known, claw event taxonomy silent. + +5. **Tests exist for the Anthropic side, are absent for the OpenAI-compat side.** `rust/crates/api/src/sse.rs:243-296` has `parses_thinking_related_deltas` and a `parses_thinking_content_block_start` companion. Both assert that the SSE parser emits the `Thinking` content block and `ThinkingDelta` events. Searching the openai_compat module for analogous tests: + +```bash +$ grep -n "fn .*reasoning\|fn .*thinking\|reasoning_content" rust/crates/api/src/providers/openai_compat.rs +# (empty) +``` + +Zero tests. The asymmetry mirrors the production gap exactly — the test-coverage shape is `anthropic.thinking: present, openai_compat.reasoning_content: absent`. Same shape as #211 (no o-series test) and #212 (no parallel-tool modifier test) and #213 (no cached-token visibility test). + +6. **No CLI surface, no plugin override, no environment knob.** `grep -rn "show.thinking\|show.reasoning\|--reasoning\|--thinking" rust/crates/rusty-claude-cli/` returns hits only for the request-side `--reasoning-effort` flag (`main.rs:823, 925, 935`, etc. — the input parameter) and zero hits for any response-side reasoning-visibility flag. No `~/.claw/config.toml` entry for `[reasoning] show_chain_of_thought = true`. Plugins cannot inject the missing field through `extra_body` or post-process because the field is dropped at serde-deserialize time, before any plugin sees it. No surface area for users or operators to access reasoning content on OpenAI-compat paths. This mirrors the surface gap in #213 (no `--show-cache-stats`) and #207 (no `--show-reasoning-tokens`). + +7. **Compounding with the existing reasoning-model cluster.** #207 ("`completion_tokens_details.reasoning_tokens` not deserialized") catches the missing *count* of reasoning tokens at the usage layer. #211 ("`max_tokens` sent to o-series instead of `max_completion_tokens`") catches the missing *parameter routing* for reasoning models on the request side. #214 catches the missing *content stream* for reasoning models on the response side. The trio (#207 + #211 + #214) covers request → response → metering for the full reasoning-model lifecycle on the openai-compat path. All three independently broken; fixing only one or two leaves a half-functional reasoning lane. The deeper fix is a cluster-wide `ReasoningContract { request_param_name, response_content_field, usage_counter_field, multi_turn_passthrough_required }` table similar to the `MODEL_PARAM_REQUIREMENTS` shape recorded in #211/#212/#213. + +8. **External validation — every adjacent agent ships a reader for this field.** + - **anomalyco/opencode #24124** (active issue, verified 2026-04-25 via web search) tracks the multi-turn `reasoning_content` 400 error on `deepseek-reasoner`, confirming the field is industry-wide live wire traffic. The fix shipped in opencode parses `delta.reasoning_content` into a `reasoning` content part with `providerOptions.reasoning_content` round-trip support across turns. + - **charmbracelet/crush** parses `delta.reasoning_content` and routes it into the TUI's reasoning panel via `usage.reasoning_text` — surfaced behind a `--show-reasoning` flag with a config-file equivalent. + - **simonw/llm** exposes `--show-cot` (chain-of-thought) for any provider that returns `reasoning_content` or `reasoning`. + - **Vercel AI SDK** `LanguageModelV1Usage` extends with `reasoningTokens` and the message stream emits `reasoning` parts (https://ai-sdk.dev/docs/ai-sdk-core/generating-text#reasoning). + - **LangChain** `BaseChatOpenAI` returns `additional_kwargs.reasoning_content` for DeepSeek/Qwen reasoning models since v0.3.x. + - **vLLM** built-in `--reasoning-parser deepseek_r1` (https://docs.vllm.ai/en/latest/features/reasoning_outputs.html) standardizes the wire format so any downstream agent has one shape to read. + - **LiteLLM** wraps reasoning_content into a unified `provider_specific_fields.reasoning` slot and surfaces `--show-reasoning` per-call. + - **continue.dev** had the same gap filed at #9245 and shipped the fix. + - **siliconflow.cn**, **agentscope-ai/QwenPaw#3782**, **dataleadsfuture.com integration guide**: same wire shape, same parser, all shipped. + - claw-code is the only mainstream agent CLI in the cluster without any reader for `delta.reasoning_content`. The control surface is universal across the ecosystem and structurally absent here. + +**Repro (verified 2026-04-25 21:55 KST):** + +```bash +# 1. Confirm zero hits for reasoning_content across the entire repo +cd ~/clawd/claw-code +grep -rn "reasoning_content" rust/ src/ tests/ docs/ 2>/dev/null +# Output: (empty — verified) + +# 2. Confirm ChunkDelta lacks reasoning_content +grep -A 6 "^struct ChunkDelta" rust/crates/api/src/providers/openai_compat.rs +# struct ChunkDelta { +# #[serde(default)] +# content: Option, +# #[serde(default, deserialize_with = "deserialize_null_as_empty_vec")] +# tool_calls: Vec, +# } + +# 3. Confirm ChatMessage (non-streaming) lacks reasoning_content +grep -A 6 "^struct ChatMessage" rust/crates/api/src/providers/openai_compat.rs +# struct ChatMessage { +# role: String, +# #[serde(default)] +# content: Option, +# #[serde(default)] +# tool_calls: Vec, +# } + +# 4. Confirm Anthropic native path correctly emits ThinkingDelta +grep -n "ContentBlockDelta::ThinkingDelta\|OutputContentBlock::Thinking" rust/crates/api/src/sse.rs +# rust/crates/api/src/sse.rs:260: content_block: OutputContentBlock::Thinking { +# rust/crates/api/src/sse.rs:288: delta: ContentBlockDelta::ThinkingDelta { + +# 5. Confirm the destination types are declared and ready +grep -n "ThinkingDelta\|Thinking {" rust/crates/api/src/types.rs +# rust/crates/api/src/types.rs:156: Thinking { +# rust/crates/api/src/types.rs:158: thinking: String, +# rust/crates/api/src/types.rs:245: ThinkingDelta { thinking: String }, + +# 6. Confirm is_reasoning_model already classifies the relevant families +grep -A 14 "pub fn is_reasoning_model" rust/crates/api/src/providers/openai_compat.rs +# pub fn is_reasoning_model(model: &str) -> bool { +# ... +# canonical.starts_with("o1") || canonical.starts_with("o3") || canonical.starts_with("o4") +# || canonical == "grok-3-mini" +# || canonical.starts_with("qwen-qwq") || canonical.starts_with("qwq") +# || canonical.contains("thinking") +# } + +# 7. Confirm zero test coverage for reasoning_content on the openai-compat path +grep -n "fn .*reasoning\|fn .*thinking" rust/crates/api/src/providers/openai_compat.rs +# (empty — only the comment at line 790 mentions "Qwen3-Thinking family") +``` + +```rust +// 8. Demonstrative tests that should exist and currently do not + +#[test] +fn deepseek_reasoner_streaming_chunk_emits_thinking_delta() { + // DeepSeek reasoning-model wire format — sibling field at delta root + let chunk_json = r#"{ + "id": "chatcmpl-deepseek-1", + "model": "deepseek-reasoner", + "choices": [{ + "index": 0, + "delta": { + "role": "assistant", + "content": null, + "reasoning_content": "Let me think about this step by step. First, I need to..." + } + }] + }"#; + let chunk: ChatCompletionChunk = serde_json::from_str(chunk_json).unwrap(); + let mut state = StreamState::new("deepseek-reasoner".to_string()); + let events = state.ingest_chunk(chunk).unwrap(); + // currently: events contains zero ThinkingDelta variants — bug + // expected: events contains ContentBlockStart(Thinking) followed by ThinkingDelta { thinking: "Let me think..." } + let has_thinking = events.iter().any(|e| matches!(e, + StreamEvent::ContentBlockDelta(ContentBlockDeltaEvent { + delta: ContentBlockDelta::ThinkingDelta { .. }, .. + }) + )); + assert!(has_thinking, "reasoning_content should map to ThinkingDelta"); +} + +#[test] +fn qwen3_thinking_non_streaming_response_carries_reasoning_block() { + // Alibaba DashScope wire format for Qwen3-Thinking — non-streaming + let response_json = r#"{ + "id": "qwen-1", + "model": "qwen3-30b-a3b-thinking", + "choices": [{ + "message": { + "role": "assistant", + "content": "The answer is 42.", + "reasoning_content": "I considered options A, B, C and concluded 42." + }, + "finish_reason": "stop" + }], + "usage": { "prompt_tokens": 10, "completion_tokens": 50 } + }"#; + let response: ChatCompletionResponse = serde_json::from_str(response_json).unwrap(); + let normalized = normalize_response(response, "qwen3-30b-a3b-thinking").unwrap(); + // currently: normalized.content has only the Text block — bug + // expected: normalized.content has [Thinking { thinking: "I considered..." }, Text { text: "The answer is 42." }] + let has_thinking_block = normalized.content.iter().any(|b| matches!(b, OutputContentBlock::Thinking { .. })); + assert!(has_thinking_block, "reasoning_content should map to Thinking content block"); +} + +#[test] +fn non_reasoning_model_with_no_reasoning_content_is_unaffected() { + // Backward compat — gpt-4o has no reasoning_content; behavior must be unchanged + let chunk_json = r#"{ + "id": "chatcmpl-1", + "model": "gpt-4o", + "choices": [{ "index": 0, "delta": { "content": "hello" } }] + }"#; + let chunk: ChatCompletionChunk = serde_json::from_str(chunk_json).unwrap(); + let mut state = StreamState::new("gpt-4o".to_string()); + let events = state.ingest_chunk(chunk).unwrap(); + // expected: ContentBlockStart(Text) + ContentBlockDelta(TextDelta) — same as today + let has_thinking = events.iter().any(|e| matches!(e, + StreamEvent::ContentBlockDelta(ContentBlockDeltaEvent { + delta: ContentBlockDelta::ThinkingDelta { .. }, .. + }) + )); + assert!(!has_thinking, "non-reasoning models should not synthesize ThinkingDelta"); +} +``` + +**Fix shape (not implemented in this cycle, recorded for cluster refactor):** + +The minimal fix is a four-touch change: (a) add `#[serde(default)] reasoning_content: Option` to `ChunkDelta` and `ChatMessage`, plus `#[serde(alias = "reasoning")]` for the proxy variant; (b) in `StreamState::ingest_chunk`, when `delta.reasoning_content` is `Some(text)` and non-empty, emit `ContentBlockStart(OutputContentBlock::Thinking { thinking: "" })` on first sight (separate block index from text) followed by `ContentBlockDelta(ContentBlockDelta::ThinkingDelta { thinking: text })`; (c) in `normalize_response`, when `message.reasoning_content` is `Some(text)`, prepend an `OutputContentBlock::Thinking { thinking: text, signature: None }` to the content vec; (d) add three regression tests covering streaming/non-streaming/backward-compat. Estimate: ~50 LOC production + ~80 LOC test. Plus a `StreamEvent::ReasoningContentReceived { count: u32 }` for cluster-wide event-emission parity (sibling fix to #201/#202/#203/#208/#211/#212/#213). + +The deeper fix is a unified-registry refactor (`MODEL_PARAM_REQUIREMENTS` table — sibling fix shape recorded in #211/#212/#213) that adds a sixth column `response_reasoning_field: ReasoningContent | Reasoning | None` describing how each provider exposes CoT, plus a cluster-wide `ChunkDelta::extract_reasoning(provider)` helper that owns the per-provider translation and emits the structured event. This closes #201/#202/#203/#206/#207/#208/#209/#210/#211/#212/#213/#214 in one structural change. + +**Status:** Open. No code changed. Filed 2026-04-25 22:00 KST. Branch: feat/jobdori-168c-emission-routing. HEAD: 347102d. Sibling-shape cluster (silent-fallback / silent-drop / silent-strip / silent-misnomer / silent-shadow / silent-prefix-mismatch / structural-absence / silent-zero-coercion / silent-content-discard at provider/CLI boundary): #201/#202/#203/#206/#207/#208/#209/#210/#211/#212/#213/#214 — thirteen pinpoints, one unified-registry refactor (`MODEL_PARAM_REQUIREMENTS` with six columns: `tuning_params_strip`, `max_output_tokens`, `max_tokens_param_name`, `default_parallel_tool_calls`, `cache_token_wire_shape`, `response_reasoning_field`) closes them all. Reasoning-fidelity cluster (the openai-compat reasoning-model lifecycle): #207 (reasoning_tokens counter) + #211 (max_completion_tokens param) + #214 (reasoning_content stream) — three pinpoints, one reasoning lane. Wire-format-parity cluster: #211 (max_tokens) + #212 (parallel_tool_calls) + #213 (cached_tokens) + #214 (reasoning_content) — four pinpoints, all upstream-contract divergence at the provider boundary. External validation: DeepSeek API docs (https://api-docs.deepseek.com/guides/reasoning_model + https://api-docs.deepseek.com/guides/thinking_mode), vLLM reasoning-outputs docs (https://docs.vllm.ai/en/latest/features/reasoning_outputs.html), anomalyco/opencode#24124 (active issue, identical pattern), charmbracelet/crush usage telemetry, simonw/llm `--show-cot`, Vercel AI SDK `LanguageModelV1Usage.reasoningTokens` + message stream `reasoning` parts, LangChain `BaseChatOpenAI` `additional_kwargs.reasoning_content`, LiteLLM `provider_specific_fields.reasoning`, continue.dev#9245, siliconflow.cn reasoning capabilities, agentscope-ai/QwenPaw#3782, dataleadsfuture.com R1 integration guide — same control surface available across the entire ecosystem, absent only in claw-code. + +🪨