From ceb092abd736dc7c5602ffedbebff1337d395c8e Mon Sep 17 00:00:00 2001 From: YeonGyu-Kim Date: Sat, 25 Apr 2026 23:12:25 +0900 Subject: [PATCH] =?UTF-8?q?roadmap:=20#216=20filed=20=E2=80=94=20neither?= =?UTF-8?q?=20MessageRequest=20nor=20MessageResponse=20has=20any=20service?= =?UTF-8?q?=5Ftier=20field;=20build=5Fchat=5Fcompletion=5Frequest=20(opena?= =?UTF-8?q?i=5Fcompat.rs:845)=20writes=20thirteen=20optional=20fields=20(m?= =?UTF-8?q?odel,=20max=5Ftokens/max=5Fcompletion=5Ftokens,=20messages,=20s?= =?UTF-8?q?tream,=20stream=5Foptions,=20tools,=20tool=5Fchoice,=20temperat?= =?UTF-8?q?ure,=20top=5Fp,=20frequency=5Fpenalty,=20presence=5Fpenalty,=20?= =?UTF-8?q?stop,=20reasoning=5Feffort)=20and=20does=20not=20write=20servic?= =?UTF-8?q?e=5Ftier;=20AnthropicClient::send=5Fraw=5Frequest=20(anthropic.?= =?UTF-8?q?rs:466)=20renders=20the=20same=20MessageRequest=20struct=20via?= =?UTF-8?q?=20AnthropicRequestProfile::render=5Fjson=5Fbody=20(telemetry/l?= =?UTF-8?q?ib.rs:107)=20which=20has=20no=20field=20for=20it=20either,=20on?= =?UTF-8?q?ly=20a=20per-client=20extra=5Fbody=20escape=20hatch=20(asymmetr?= =?UTF-8?q?ic=20=E2=80=94=20openai=5Fcompat=20path=20has=20zero=20hits=20f?= =?UTF-8?q?or=20extra=5Fbody);=20ChatCompletionResponse=20/=20ChatCompleti?= =?UTF-8?q?onChunk=20/=20OpenAiUsage=20all=20deserialize=20four=20fields?= =?UTF-8?q?=20each,=20dropping=20the=20upstream-echoed=20service=5Ftier=20?= =?UTF-8?q?confirmation=20and=20the=20system=5Ffingerprint=20reproducibili?= =?UTF-8?q?ty=20marker=20that=20OpenAI=20documents=20as=20the=20canonical?= =?UTF-8?q?=20"what=20backend=20served=20you"=20signal;=20claw=20cannot=20?= =?UTF-8?q?opt=20into=20OpenAI=20flex=20(~50%=20cheaper=20async=20batch=20?= =?UTF-8?q?=E2=80=94=20developers.openai.com/api/docs/guides/flex-processi?= =?UTF-8?q?ng),=20cannot=20opt=20into=20OpenAI=20priority=20(~1.5-2x=20pre?= =?UTF-8?q?mium=20SLA=20latency=20=E2=80=94=20developers.openai.com/api/do?= =?UTF-8?q?cs/guides/priority-processing),=20cannot=20opt=20into=20Anthrop?= =?UTF-8?q?ic=20priority=20(auto/standard=5Fonly=20=E2=80=94=20platform.cl?= =?UTF-8?q?aude.com/docs/en/api/service-tiers),=20and=20cannot=20detect=20?= =?UTF-8?q?at=20the=20response=20layer=20whether=20a=20request=20was=20fle?= =?UTF-8?q?x-served=20or=20silently=20upgraded=20to=20priority=20by=20a=20?= =?UTF-8?q?project-level=20default=20override=20(Jobdori=20cycle=20#368=20?= =?UTF-8?q?/=20extends=20#168c=20emission-routing=20audit=20/=20sibling-sh?= =?UTF-8?q?ape=20cluster=20grows=20to=20fifteen:=20#201/#202/#203/#206/#20?= =?UTF-8?q?7/#208/#209/#210/#211/#212/#213/#214/#215/#216=20/=20wire-forma?= =?UTF-8?q?t-parity=20cluster=20grows=20to=20six:=20#211+#212+#213+#214+#2?= =?UTF-8?q?15+#216=20/=20cost-parity=20cluster=20grows=20to=20six:=20#204+?= =?UTF-8?q?#207+#209+#210+#213+#216=20/=20three-dimensional-structural-abs?= =?UTF-8?q?ence=20shape:=20request-side=20write=20+=20response-side=20read?= =?UTF-8?q?=20+=20reproducibility=20marker,=20distinct=20from=20prior=20re?= =?UTF-8?q?quest-only=20#211#212=20/=20response-only=20#207#213#214=20/=20?= =?UTF-8?q?header-only=20#215=20members=20/=20external=20validation:=20Ope?= =?UTF-8?q?nAI=20flex/priority/scale-tier=20guides,=20OpenAI=20advanced-us?= =?UTF-8?q?age=20system=5Ffingerprint=20guide,=20Anthropic=20service-tiers?= =?UTF-8?q?=20reference,=20OpenTelemetry=20GenAI=20semconv=20gen=5Fai.open?= =?UTF-8?q?ai.request.service=5Ftier=20+=20gen=5Fai.openai.response.servic?= =?UTF-8?q?e=5Ftier=20+=20gen=5Fai.openai.response.system=5Ffingerprint,?= =?UTF-8?q?=20anomalyco/opencode#12297,=20Vercel=20AI=20SDK=20serviceTier?= =?UTF-8?q?=20provider=20option,=20LangChain=20ChatOpenAI=20service=5Ftier?= =?UTF-8?q?=20ctor=20param,=20LiteLLM=20service=5Ftier=20pass-through,=20s?= =?UTF-8?q?emantic-kernel=20OpenAIPromptExecutionSettings.ServiceTier,=20o?= =?UTF-8?q?penai-python=20SDK=20client.chat.completions.create(service=5Ft?= =?UTF-8?q?ier=3D...)=20first-class=20kwarg,=20MiniMax/DeepSeek=20Anthropi?= =?UTF-8?q?c-compat=20layer=20notes,=20badlogic/pi-mono#1381)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- ROADMAP.md | 150 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 150 insertions(+) diff --git a/ROADMAP.md b/ROADMAP.md index fe8fb2a..4e80a72 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -14413,3 +14413,153 @@ The deeper fix is a unified `RateLimitContext` struct on `ApiError::Api` that co **Status:** Open. No code changed. Filed 2026-04-25 22:40 KST. Branch: feat/jobdori-168c-emission-routing. HEAD: 959bdf8. Sibling-shape cluster (silent-fallback / silent-drop / silent-strip / silent-misnomer / silent-shadow / silent-prefix-mismatch / structural-absence / silent-zero-coercion / silent-content-discard / silent-header-discard at provider/CLI boundary): #201/#202/#203/#206/#207/#208/#209/#210/#211/#212/#213/#214/#215 — fourteen pinpoints. Upstream-contract-honoring trio (the wire-protocol-as-input dimension that #214 left half-covered): #211 (max_completion_tokens — claw chooses the wrong request param) + #213 (cached_tokens — claw drops upstream-counted cache hits) + #215 (Retry-After — claw ignores upstream-specified wait) — three pinpoints, all "upstream told us; claw was not listening." Wire-format-parity cluster: #211 + #212 + #213 + #214 + #215. External validation: Anthropic rate-limits docs (https://platform.claude.com/docs/en/api/rate-limits — `Retry-After` is the documented mechanism), OpenAI cookbook on rate limits (https://developers.openai.com/cookbook/examples/how_to_handle_rate_limits — `Retry-After` is the documented mechanism), DeepSeek rate-limit docs (https://api-docs.deepseek.com/quick_start/rate_limit), RFC 7231 §7.1.3 (the canonical wire spec for the header — both delta-seconds and HTTP-date), openai-python SDK (parses `retry-after-ms` + `Retry-After` natively — https://github.com/openai/openai-python/issues/957), Vercel AI SDK `LanguageModelV1RateLimit.retryAfter` (v3.4+), LangChain `BaseChatOpenAI` retry-after handling, anomalyco/opencode#16993/#16994/#9091/#17583 (active issue cluster — same gap reported and partially fixed in the upstream peer; opencode#11705 specifically calls out hanging on 429), charmbracelet/crush retry-after honoring, simonw/llm rate-limit handling, LiteLLM `Router.retry_after_strategy`, semantic-kernel Azure-OpenAI retry policy, alexwlchan/handling-http-429-with-tenacity (canonical Python pattern) — same control surface available across the entire ecosystem, absent only in claw-code at three structural layers (wire-read, error-shape, scheduler-input) simultaneously. 🪨 + +## Pinpoint #216 — Neither `MessageRequest` nor `MessageResponse` has any `service_tier` field; `build_chat_completion_request` (openai_compat.rs:845) writes thirteen optional fields to the wire payload (model, max_tokens/max_completion_tokens, messages, stream, stream_options, tools, tool_choice, temperature, top_p, frequency_penalty, presence_penalty, stop, reasoning_effort) and does not write `service_tier`; the Anthropic side serializes only the same `MessageRequest` struct via `AnthropicRequestProfile::render_json_body` (telemetry/lib.rs:107) which has no field for it either; `OpenAiUsage` (openai_compat.rs:709) and `MessageResponse` (api/types.rs:121) deserialize `id/model/choices/usage` and `id/kind/role/content/model/stop_reason/stop_sequence/usage/request_id` respectively, dropping the upstream-echoed `service_tier` confirmation and the `system_fingerprint` reproducibility marker that OpenAI documents as the canonical "what backend actually served you" signal — claw cannot opt into OpenAI flex tier (~50% cheaper, documented at developers.openai.com/api/docs/guides/flex-processing), cannot opt into OpenAI priority tier (~1.5-2x premium for SLA latency, developers.openai.com/api/docs/guides/priority-processing), cannot opt into Anthropic priority tier (`service_tier: "auto" | "standard_only"`, platform.claude.com/docs/en/api/service-tiers), and cannot detect at the response layer whether a request was downgraded to Standard, served under flex, or hit a different backend snapshot than a previous run with the same `seed` (Jobdori, cycle #368 / extends #168c emission-routing audit / sibling-shape cluster grows to fifteen / wire-format-parity cluster grows to six) + +**Observed:** Three structural absences in one shape — request-side write, response-side read, response-side reproducibility marker — across both providers symmetrically. + +**(1) `MessageRequest` request-side absence.** In `rust/crates/api/src/types.rs:6-34` the entire wire-input surface is: + +```rust +pub struct MessageRequest { + pub model: String, + pub max_tokens: u32, + pub messages: Vec, + #[serde(skip_serializing_if = "Option::is_none")] + pub system: Option, + #[serde(skip_serializing_if = "Option::is_none")] + pub tools: Option>, + #[serde(skip_serializing_if = "Option::is_none")] + pub tool_choice: Option, + #[serde(default, skip_serializing_if = "std::ops::Not::not")] + pub stream: bool, + #[serde(skip_serializing_if = "Option::is_none")] + pub temperature: Option, + #[serde(skip_serializing_if = "Option::is_none")] + pub top_p: Option, + #[serde(skip_serializing_if = "Option::is_none")] + pub frequency_penalty: Option, + #[serde(skip_serializing_if = "Option::is_none")] + pub presence_penalty: Option, + #[serde(skip_serializing_if = "Option::is_none")] + pub stop: Option>, + #[serde(skip_serializing_if = "Option::is_none")] + pub reasoning_effort: Option, +} +``` + +There is no `service_tier: Option` (or strongly-typed `ServiceTier` enum). `cd rust && grep -rn "service_tier\|ServiceTier" --include="*.rs"` returns zero hits across the entire workspace. + +**(2) `build_chat_completion_request` does not write the field.** In `rust/crates/api/src/providers/openai_compat.rs:845-928` the wire payload is constructed, then incrementally extended for each optional field that the request struct exposes: + +```rust +let mut payload = json!({ + "model": wire_model, + max_tokens_key: request.max_tokens, + "messages": messages, + "stream": request.stream, +}); + +if request.stream && should_request_stream_usage(config) { + payload["stream_options"] = json!({ "include_usage": true }); +} + +if let Some(tools) = &request.tools { /* write tools */ } +if let Some(tool_choice) = &request.tool_choice { /* write tool_choice */ } + +if !is_reasoning_model(&request.model) { + if let Some(temperature) = request.temperature { payload["temperature"] = json!(temperature); } + if let Some(top_p) = request.top_p { payload["top_p"] = json!(top_p); } + if let Some(frequency_penalty) = request.frequency_penalty { payload["frequency_penalty"] = json!(frequency_penalty); } + if let Some(presence_penalty) = request.presence_penalty { payload["presence_penalty"] = json!(presence_penalty); } +} +if let Some(stop) = &request.stop { /* write stop */ } +if let Some(effort) = &request.reasoning_effort { payload["reasoning_effort"] = json!(effort); } + +payload +``` + +Thirteen distinct write sites for thirteen distinct OpenAI tuning parameters; zero write site for `service_tier`. The function is `pub fn` — exercised directly by `bench/openai_compat_chat_request.rs` and indirectly by every send path in `OpenAiCompatClient`. If a caller wanted flex-tier processing for an asynchronous batch agent (the textbook flex-tier use case — non-interactive code review, plan generation, summary compaction — exactly what claw runs), there is no field on `MessageRequest` to set, no place in the builder to write it, and no env-var or config knob anywhere downstream. Even hand-mutating the returned `Value` outside the function does not help: every send path in `OpenAiCompatClient::send_with_retry` calls `build_chat_completion_request` itself and uses the returned payload directly with no caller-injection hook between build and send. + +**(3) Anthropic side: same absence, same root cause.** `AnthropicClient::send_raw_request` (rust/crates/api/src/providers/anthropic.rs:466-475) renders the body via `self.request_profile.render_json_body(request)`, which `serde_json::to_value`s the same `MessageRequest` struct from (1) and then merges in `extra_body` from `AnthropicRequestProfile::extra_body` (telemetry/lib.rs:60). The `extra_body` merge (telemetry/lib.rs:114-116) is per-client, not per-request — set once on the profile and applied to every outbound message. Anthropic's documented `service_tier: "auto" | "standard_only"` (platform.claude.com/docs/en/api/service-tiers) cannot be set per-call without constructing a fresh `AnthropicClient` for every request whose tier policy differs, which the runtime's session-pinned client architecture does not contemplate. There is also no helper symmetric to OpenAI's `with_extra_body_param` on `OpenAiCompatClient` (anthropic.rs:233-235 has `with_extra_body_param`; openai_compat.rs has zero hits for `extra_body` — `cd rust && grep -rn "extra_body" --include="*.rs"` returns four hits, all in the Anthropic provider and the telemetry crate). The escape hatch is asymmetric across the two providers, and even on the Anthropic side it is per-client not per-request. + +**(4) Response-side: `service_tier` echo and `system_fingerprint` are both dropped at deserialize time.** OpenAI documents that the response body echoes the actual tier that served the request (so a request submitted with `service_tier: "auto"` returns `service_tier: "default" | "flex" | "scale" | "priority"` indicating which one was used), and OpenAI emits a `system_fingerprint` string identifying the backend snapshot (developers.openai.com/api/docs/guides/advanced-usage — paired with `seed` for determinism debugging). In `rust/crates/api/src/providers/openai_compat.rs:672-679` the entire response shape claw deserializes is: + +```rust +#[derive(Debug, Deserialize)] +struct ChatCompletionResponse { + id: String, + model: String, + choices: Vec, + #[serde(default)] + usage: Option, +} +``` + +Four fields. `service_tier` and `system_fingerprint` are absent from the struct, so serde discards them at parse time before any caller can observe them. Same drop for streaming: `ChatCompletionChunk` (openai_compat.rs:719-728) reads `id/model/choices/usage` only. The unified `MessageResponse` (rust/crates/api/src/types.rs:121-136) that both providers populate has nine fields (`id/kind/role/content/model/stop_reason/stop_sequence/usage/request_id`) and zero of them are tier-policy or backend-identity fields. The Anthropic native parser at `anthropic.rs:986+` has the same gap: `service_tier` echo on the message-stop event (Anthropic emits it as part of the final usage block) is silently absent from the destination type. + +**(5) Cluster-shape kinship.** This is the same kinship signature as #211 (max_completion_tokens: claw chose the wrong wire param) + #212 (parallel_tool_calls / disable_parallel_tool_use: claw could not express either) + #213 (cached_tokens: claw discarded a wire-counted field at deserialize) + #214 (reasoning_content: claw discarded a wire-streamed field at deserialize) + #215 (Retry-After: claw discarded a wire header at the failure path). Each member is one of: claw cannot ask for X (request-side absence), claw cannot hear X (response-side absence at deserialize), claw cannot honor X (header/contract drop at the failure path). #216 is all three at once, on both providers, for the same canonical wire field — request-side absence for the input, response-side absence for the echo, response-side absence for the reproducibility marker. The 1.5-3x cost-pricing-tier delta (flex → standard → priority) directly compounds the cost-parity cluster (#204 + #207 + #209 + #210 + #213): even after fixing every cache-hit and pricing-table gap in that cluster, claw still bills sessions at standard-tier pricing with no opt-in to flex's ~50% discount and no opt-out from accidental priority-tier surprise upgrades that could 2x bill a misconfigured prod deployment. + +**Reproduction sketch:** + +```rust +// Test that flex-tier flag round-trips into the wire payload — currently impossible +// to write because MessageRequest has no field for it. +#[test] +fn openai_compat_request_can_request_flex_tier() { + let request = MessageRequest { + model: "gpt-5".to_string(), + max_tokens: 256, + messages: vec![InputMessage::user_text("hello")], + // currently: cannot set service_tier: "flex" — field does not exist + // expected: service_tier: Some(ServiceTier::Flex) + ..MessageRequest::default() + }; + let payload = build_chat_completion_request(&request, openai_config()); + // currently: payload.get("service_tier") is None — bug + // expected: payload["service_tier"] == "flex" + assert_eq!(payload.get("service_tier").and_then(|v| v.as_str()), Some("flex")); +} + +// Test that response service_tier echo round-trips — currently impossible to read. +#[tokio::test] +async fn openai_compat_response_surfaces_service_tier_echo() { + let body = json!({ + "id": "chatcmpl-1", + "model": "gpt-5", + "service_tier": "default", // "auto" got served by default + "system_fingerprint": "fp_abc", + "choices": [{ "message": {"role": "assistant", "content": "ok"}, "finish_reason": "stop" }], + "usage": {"prompt_tokens": 10, "completion_tokens": 5}, + }); + let mock = mock_server::respond_with_json(StatusCode::OK, body); + let response = client.send_message(&request).await.unwrap(); + // currently: response.service_tier() does not exist — bug + // expected: response.service_tier() == Some(ServiceTier::Default) + assert_eq!(response.service_tier(), Some(ServiceTier::Default)); + assert_eq!(response.system_fingerprint(), Some("fp_abc")); +} + +// Test that an unsolicited tier upgrade is observable — currently impossible. +#[tokio::test] +async fn unsolicited_priority_upgrade_emits_event() { + // Request submitted with service_tier omitted; OpenAI echoes "priority" anyway + // (project-level default override). Claw should emit StreamEvent::ServiceTierServed + // so a 2x cost surprise cannot land silently. + let body = json!({ /* ... service_tier: "priority" ... */ }); + let events = collect_stream_events(client, request, body).await; + let tier_events: Vec<_> = events.iter().filter(|e| matches!(e, StreamEvent::ServiceTierServed { .. })).collect(); + assert_eq!(tier_events.len(), 1); + // currently: 0 events — bug +} +``` + +**Fix shape (not implemented in this cycle, recorded for cluster refactor):** + +The minimal fix is a six-touch change: (a) add `pub service_tier: Option` to `MessageRequest` in `rust/crates/api/src/types.rs:6-34`, where `ServiceTier` is a typed `enum { Auto, Default, Flex, Priority, Scale, StandardOnly }` (the union of OpenAI's documented values and Anthropic's `auto`/`standard_only`) with `serde(rename_all = "snake_case", skip_serializing_if = "Option::is_none")`; (b) extend `build_chat_completion_request` (openai_compat.rs:845) with a single `if let Some(tier) = &request.service_tier { payload["service_tier"] = json!(tier); }` block, validated by `is_service_tier_supported(wire_model)` so unsupported model families silently strip the field with a `StreamEvent::ServiceTierStripped { model, tier, reason }` emission for visibility (sibling to the silent-strip pattern called out in #208 / #211 / #214); (c) populate the request body on the Anthropic path by adding the same per-request field — `render_json_body` already serializes whatever `MessageRequest` declares, so this is zero code in the telemetry crate, just a model-eligibility check at `AnthropicClient::send_raw_request` that strips the field before send for non-priority-eligible Anthropic models; (d) add `service_tier: Option` and `system_fingerprint: Option` to `MessageResponse` in `types.rs:121` and to the wire deserialize structs `ChatCompletionResponse` (openai_compat.rs:672) and `ChatCompletionChunk` (openai_compat.rs:719) plus the Anthropic native parser at `anthropic.rs:986+`; (e) populate them through both `chat_completion_to_message_response` and the streaming aggregator so all four send paths surface the echo; (f) emit `StreamEvent::ServiceTierServed { requested: Option, served: Option, system_fingerprint: Option }` after every request so claws can render an honest "you asked for X, you got Y" panel and detect silent project-level upgrades that would otherwise show up only as bill shock at month-end. Estimate: ~120 LOC production + ~180 LOC test (httpmock or wiremock-rs based, both providers, both streaming and non-streaming). + +The deeper fix is to lift `service_tier` out of the per-request "tuning parameter" cluster and into a first-class `RequestPolicy` struct on `MessageRequest` alongside the `RetryPolicy` from #215 and a future `RateLimitPolicy` honoring the wire `Retry-After` and `x-ratelimit-*` headers. Then a cluster-wide `WirePolicyEvent` taxonomy (`ServiceTierServed`, `RetryAfterReceived`, `ParallelToolCallsCapped`, `ReasoningContentStreamed`, `MaxTokensClamped`) gives claws a single subscriber-side surface for every wire-protocol-as-input/output dimension that the silent-fallback / silent-drop / silent-strip cluster has now identified across fifteen pinpoints. This closes #216 cleanly and turns the wire-format-parity cluster (#211 + #212 + #213 + #214 + #215 + #216) into one composable policy plane rather than six independent struct-field battles. + +**Status:** Open. No code changed. Filed 2026-04-25 23:00 KST. Branch: feat/jobdori-168c-emission-routing. HEAD: 2da1211. Sibling-shape cluster (silent-fallback / silent-drop / silent-strip / silent-misnomer / silent-shadow / silent-prefix-mismatch / structural-absence / silent-zero-coercion / silent-content-discard / silent-header-discard / silent-tier-absence at provider/CLI boundary): #201/#202/#203/#206/#207/#208/#209/#210/#211/#212/#213/#214/#215/#216 — fifteen pinpoints. Wire-format-parity cluster: #211 (max_completion_tokens) + #212 (parallel_tool_calls) + #213 (cached_tokens) + #214 (reasoning_content) + #215 (Retry-After) + #216 (service_tier + system_fingerprint) — six pinpoints, every member is "claw and the wire format disagree on a documented field." Cost-parity cluster grows by direct adjacency: #204 + #207 + #209 + #210 + #213 + #216 — six pinpoints, all "claw bills the wrong number." Three-dimensional structural absence (request-side write + response-side read + reproducibility marker) is itself a new shape inside the cluster, distinct from the prior "request-side only" (#211, #212), "response-side only" (#207, #213, #214), and "header-only" (#215) members. External validation: OpenAI flex processing guide (https://developers.openai.com/api/docs/guides/flex-processing — `service_tier: "flex"` opts into ~50% cheaper async batch processing with possible Resource Unavailable errors), OpenAI priority processing guide (https://developers.openai.com/api/docs/guides/priority-processing — `service_tier: "priority"` opts into 1.5-2x premium with SLA-grade latency), OpenAI scale tier (https://openai.com/api-scale-tier/ — committed-capacity model snapshot with 99.9% uptime SLA), OpenAI advanced usage / system_fingerprint guide (https://developers.openai.com/api/docs/guides/advanced-usage — fingerprint paired with `seed` is the canonical determinism-debugging mechanism), Anthropic service tiers reference (https://platform.claude.com/docs/en/api/service-tiers — `auto` / `standard_only` documented, capacity-managed Priority Tier requires sales contact), OpenTelemetry GenAI semantic conventions (https://opentelemetry.io/docs/specs/semconv/registry/attributes/openai/ — `gen_ai.openai.request.service_tier` and `gen_ai.openai.response.service_tier` and `gen_ai.openai.response.system_fingerprint` are first-class observability attributes in the public spec, meaning every other agent/client in the OpenAI ecosystem propagates these for tracing), anomalyco/opencode#12297 (active feature request to add `serviceTier: "flex"` to the OpenAI-compatible chat provider options schema and propagate as `service_tier` in the chat request body — exact same gap as claw, identical wire-format symptom, identical fix shape, but only in one provider), Vercel AI SDK `serviceTier` provider option (v3.x — supported per-request on the OpenAI provider), LangChain ChatOpenAI `service_tier` constructor parameter, LiteLLM `service_tier` request param pass-through, semantic-kernel `OpenAIPromptExecutionSettings.ServiceTier`, openai-python SDK `client.chat.completions.create(service_tier="flex", ...)` (first-class kwarg), MiniMax / DeepSeek Anthropic-compat layer notes (https://platform.minimax.io/docs/api-reference/text-anthropic-api — explicitly document that `service_tier` is a wire-recognized field on the Anthropic-shape contract that some compat layers ignore, signaling the field is part of the public contract claws need to observe even when the upstream backend silently drops it), badlogic/pi-mono#1381 (peer-tracker for service-tier propagation in coding agents) — same control surface available across literally every other major LLM client / agent / observability spec in the ecosystem, absent only in claw-code at all four structural layers (request-struct field, request-builder write site, response-struct field, response-deserialize read site) simultaneously across two providers, breaking both cost-control opt-in (flex) and silent-upgrade-detection (priority) and run-reproducibility (system_fingerprint) all in one shape. + +🪨