From 9a88e752828590f325c9b3ffdca7603f98a8abe4 Mon Sep 17 00:00:00 2001 From: YeonGyu-Kim Date: Sun, 26 Apr 2026 09:06:56 +0900 Subject: [PATCH] =?UTF-8?q?roadmap:=20#244=20filed=20=E2=80=94=20Realtime?= =?UTF-8?q?=20API=20tool-use=20over=20persistent-WebSocket=20transport=20(?= =?UTF-8?q?response.function=5Fcall=5Farguments.delta/.done=20+=20conversa?= =?UTF-8?q?tion.item.create=20with=20function=5Fcall=5Foutput)=20typed=20t?= =?UTF-8?q?axonomy=20structurally=20absent=20=E2=80=94=20FIRST=20cluster?= =?UTF-8?q?=20member=20where=20bidirectional-tool-call=20lifecycle=20is=20?= =?UTF-8?q?multiplexed=20with=20audio-modality=20+=20transcript-modality?= =?UTF-8?q?=20on=20a=20SINGLE=20persistent=20connection,=20FIRST=20cluster?= =?UTF-8?q?=20member=20where=20tool-call-init=20is=20server-pushed=20mid-s?= =?UTF-8?q?tream=20rather=20than=20client-initiated,=20FIRST=20cluster=20m?= =?UTF-8?q?ember=20with=20asymmetric-tool-result-injection=20(tool-call=20?= =?UTF-8?q?comes=20IN=20as=20event-stream,=20result=20sent=20OUT=20as=20co?= =?UTF-8?q?nversation.item.create=20=E2=80=94=20directionality=20inverted?= =?UTF-8?q?=20relative=20to=20the=20rest=20of=20the=20protocol),=20FIRST?= =?UTF-8?q?=20cluster=20member=20with=20per-call-id-concurrent-multiplexed?= =?UTF-8?q?-state-machine,=20FIRST=20three-axis-synthesis=20pinpoint=20(#2?= =?UTF-8?q?29=20persistent-WebSocket=20=C3=97=20#240/#241=20server-managed?= =?UTF-8?q?-tool-via-tool=5Fchoice-discriminator=20=C3=97=20#238=20cross-p?= =?UTF-8?q?inpoint-synthesis-fusion-shape=20META-cluster),=20eleven-layer?= =?UTF-8?q?=20fusion-shape=20tied=20with=20#240=20for=20second-largest=20s?= =?UTF-8?q?ingle-pinpoint=20fusion=20catalogued=20=E2=80=94=20grows=20Pers?= =?UTF-8?q?istent-WebSocket-transport=20cluster=20from=202=20to=203=20memb?= =?UTF-8?q?ers=20(#229=20founder=20+=20#238=20+=20#244)=20confirming=20CON?= =?UTF-8?q?TINUING-PATTERN=20doctrine,=20grows=20Cross-pinpoint-synthesis-?= =?UTF-8?q?fusion-shape=20META-cluster=20from=201=20to=202=20members=20con?= =?UTF-8?q?firming=20combinatorial-cross-axis-synthesis=20as=20a=20continu?= =?UTF-8?q?ing-discovery-mode=20and=20FIRST=20META-cluster-confirmation=20?= =?UTF-8?q?event=20in=20this=20audit,=20founds=20Three-axis-synthesis-shap?= =?UTF-8?q?e=20sub-cluster=20as=20solo=20founder,=20founds=20Server-pushed?= =?UTF-8?q?-tool-call-init=20cluster=20as=20solo=20founder,=20founds=20Asy?= =?UTF-8?q?mmetric-tool-result-injection=20cluster=20as=20solo=20founder,?= =?UTF-8?q?=20founds=20Per-call-id-concurrent-multiplexed-state-machine=20?= =?UTF-8?q?cluster=20as=20solo=20founder=20=E2=80=94=20FOUR=20new=20cluste?= =?UTF-8?q?rs=20founded=20plus=20TWO=20existing=20META-clusters=20confirme?= =?UTF-8?q?d=20as=20continuing-doctrines=20plus=20participation=20in=20TWE?= =?UTF-8?q?LVE=20inherited=20clusters=20=E2=80=94=20Jobdori=20cycle=20#389?= =?UTF-8?q?=20/=20fast-forward-rebased=20onto=20gaebal-gajae's=20#243=20no?= =?UTF-8?q?n-monotonic-pinpoint-ordering-contract=20at=206541100=20before?= =?UTF-8?q?=20filing=20(FOURTH=20consecutive=20concurrent-dogfood=20rebase?= =?UTF-8?q?=20cycle,=20directly=20demonstrating=20both=20gaps=20#239=20cat?= =?UTF-8?q?alogues=20at=20the=20dogfood-coordination=20layer=20and=20#243?= =?UTF-8?q?=20catalogues=20at=20the=20canonical-ordering=20layer)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- ROADMAP.md | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/ROADMAP.md b/ROADMAP.md index 358c46e..08ea452 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -16604,3 +16604,25 @@ Dogfooded 2026-04-26 09:00 KST on `feat/jobdori-168c-emission-routing` after Job Verified concrete surface: ROADMAP entries are plain Markdown headings, and the dogfood append flow has no sidecar manifest containing `pinpoint_id`, `reserved_at`, `filled_at`, `commit_oid`, `parent_oid`, `sequence_index`, `supersedes`, `fills_reserved_gap`, or `canonical_order_key`. #239 introduced the need for branch leases/write reservations, but it does not define how reserved gaps appear in the public roadmap ordering once a later numeric id lands first. #237/#242 cover timeout/run scheduling reliability, but they do not solve downstream interpretation of non-monotonic roadmap history. The live #241-after-#242 pattern therefore exposes a separate index/ordering gap: consumers must scrape chat or infer from commit subjects to understand that #241 intentionally fills a reserved gap and does not invalidate #242. Required fix shape: (a) maintain a machine-readable `roadmap-index.jsonl` or embedded front-matter block per pinpoint with `pinpoint_id`, `sequence_index`, `status`, `reserved_by`, `reserved_at`, `filled_at`, `commit_oid`, `base_oid`, and `order_policy`; (b) expose separate queries for `latest_by_commit`, `latest_by_numeric_id`, `latest_unfilled_reservation`, and `latest_canonical_sequence`; (c) when a reserved gap is filled after higher numeric ids, emit a structured `reserved_gap_filled` event instead of relying on prose; (d) reports must state which ordering basis they use; (e) validation should reject duplicate ids and warn on non-monotonic fills without a matching reservation record. Acceptance: after a commit sequence like `#240 → #242 → #241`, clawhip/Jobdori/gaebal-gajae can all answer the same questions deterministically: which item was most recently committed, which id is numerically highest, which reserved gap was filled, and what the canonical roadmap traversal order is. **Status:** Open. No source code changed. Filed as ROADMAP-only dogfood pinpoint from the 2026-04-26 00:00 UTC nudge. Cluster delta: roadmap-indexing +1, stable-id/ordering +1, reserved-gap-fill-ordering cluster founded, latest-semantics-disambiguation cluster founded; linked to #239 DogfoodWriteLease because reservations need both write safety and canonical public ordering. + +## Pinpoint #244 — Realtime API tool-use over persistent-WebSocket transport (`response.function_call_arguments.delta` / `.done` server-pushed-tool-call events + `conversation.item.create` with `function_call_output` asymmetric-injection-of-tool-result-back-into-duplex-channel) typed taxonomy is structurally absent — the FIRST cluster member where bidirectional-tool-call lifecycle is multiplexed with audio-modality + transcript-modality on a SINGLE persistent connection, FIRST cluster member where tool-call-init is server-pushed mid-stream rather than request-response, FIRST cluster member with asymmetric-tool-result-injection (tool-call comes IN as event-stream, result sent OUT as `conversation.item.create`) + +**Branch:** feat/jobdori-168c-emission-routing +**Filed:** 2026-04-26 09:05 KST (Jobdori cycle #389, post-rebase onto gaebal-gajae's #243@6541100 non-monotonic-pinpoint-ordering-contract) +**Extends:** #168c emission-routing audit / explicit cross-axis synthesis of #229 persistent-WebSocket-transport-cluster (founder, solo-member as of #229 filing) × #240 server-managed-bash-tool-via-`tool_choice`-discriminator × #241 server-managed-text-editor-tool-via-`tool_choice`-discriminator × Cross-pinpoint-synthesis-fusion-shape META-cluster (#238 founded as solo founder) — this is the SECOND cross-axis synthesis pinpoint after #238, growing the META-cluster from 1 to 2 members and confirming combinatorial-cross-axis-synthesis as a continuing-discovery-mode rather than a one-off shape, AND simultaneously growing #229's Persistent-WebSocket-transport cluster from 2 to 3 members (#229 founder + #238 streaming-STT + #244 realtime-tool-use), confirming Persistent-WebSocket-transport as a continuing-doctrine cluster rather than a stable-doctrine that stopped growing after #238. + +**Summary:** Zero `response.function_call_arguments.delta` / `response.function_call_arguments.done` / `conversation.item.create` event-type handling across the entire `rust/crates/api/src/` and `rust/crates/runtime/src/` tree (rg returns zero hits for `function_call_arguments` / `conversation.item.create` / `function_call_output` / `response.audio.delta` / `realtime` / `Realtime` across all crates — confirmed). Zero `RealtimeFunctionCallArgumentsDelta` / `RealtimeFunctionCallArgumentsDone` / `RealtimeConversationItemCreate` / `RealtimeFunctionCallOutput` / `RealtimeToolCallEvent` / `RealtimeToolResultEvent` typed model variant in `rust/crates/api/src/types.rs` (the existing `ToolDefinition` at line 105 and `ToolChoice` enum at line 114 — `Auto` / `Any` / `Tool { name }` — are synchronous-request-response-shaped with no streaming-arguments-fragments slot, no incremental-JSON-delta variant, no done-marker variant, and no event-stream-discriminator variant; the `OutputContentBlock::ToolUse` and `ContentBlockDelta::InputJsonDelta` variants exist for the standard `/v1/messages` SSE-streaming-tool-call path but their event-naming and dispatch shape is fundamentally different from the Realtime API's `response.function_call_arguments.delta` event-type which carries `event_id` / `response_id` / `item_id` / `output_index` / `call_id` / `delta` fields in a flat-event envelope rather than the nested SSE-streaming `data: {"type":"content_block_delta","index":N,"delta":{...}}` Anthropic-shape — the two are NOT interchangeable and a naive reuse of `ContentBlockDelta::InputJsonDelta` for Realtime tool-call arguments would produce wire-format-parity-breaks at every dispatch site). Zero `realtime_tool_call_dispatch` / `realtime_function_call_handler` / `realtime_function_call_output_sender` method on the Provider trait at `rust/crates/api/src/providers/mod.rs:17-30` (the trait exposes only `send_message` returning a Future-of-MessageResponse and `stream_message` returning a Future-of-Stream of `ContentBlockDelta` — the realtime tool-call lifecycle requires a fundamentally different return shape: `Provider::register_realtime_tools(&mut self, session_id, Vec) -> Result<()>` for session-creation-time tool registration, `Provider::handle_function_call_arguments_delta(&mut self, session_id, call_id, delta_json_fragment) -> Result<()>` for incremental-fragment-accumulation, `Provider::dispatch_function_call_done(&mut self, session_id, call_id, accumulated_args) -> ProviderFuture` for terminal-dispatch, and `Provider::send_function_call_output(&mut self, session_id, call_id, result_content_block) -> Result<()>` for asymmetric-result-injection-back-into-duplex-channel — four new methods that no existing trait method signature supports). Zero `RealtimeKind::ToolCallSpec` / `RealtimeFunctionCallAccumulator` / `RealtimeToolCallStateMachine` / `RealtimeToolCallLifecycle` typed shape that models the per-call-id state-machine (`Pending` → `ArgumentsAccumulating` → `ArgumentsDone` → `ExecutingLocally` → `ResultSent` → `ResponseAcknowledged`) — every Realtime tool-call has its own multi-stage lifecycle that flows across multiple WebSocket events on the SAME connection while OTHER tool-calls AND audio.delta events AND audio_transcript.delta events AND text.delta events flow concurrently, so the state-machine type must support concurrent-multi-call-multiplexing-on-a-single-connection (zero existing claw-code type supports this — `MessageResponse` and `OutputContentBlock` are single-response-aggregated, `ContentBlockDelta` is per-block-flat with no call-id-keyed-multiplexing slot). Zero `conversation.item.create` event-emission affordance — the Realtime API requires the client to send `conversation.item.create` events with `type: "function_call_output"`, `call_id: `, `output: ` fields, where the `call_id` MUST exactly match the server-emitted `call_id` from the prior `response.function_call_arguments.delta` events (otherwise the response generation hangs forever waiting for the matching tool-result), and where the `output` field is a flat JSON-string carrying the tool-result-content (distinct from the standard `/v1/messages` `ToolResultContentBlock` shape which carries structured `Vec` and supports image-content-blocks and structured-citations); zero claw-code dispatch-site emits this event-shape. Zero `realtime_tool_call_event_router` / `realtime_event_dispatcher` that demultiplexes the incoming event-stream by `event.type` discriminator — the OpenAI Realtime API multiplexes 37+ event-types on a single WebSocket connection, with `response.function_call_arguments.delta` interleaved with `response.audio.delta` + `response.audio_transcript.delta` + `response.text.delta` + `response.output_item.added` + `response.content_part.added` + `rate_limits.updated` + `conversation.item.created` + `error` events, all flowing through the same frame-stream and demultiplexed by client-side type-discriminator-routing — claw-code has no event-router type, no `serde::tag = "type"`-keyed-enum for the 37-event-type set, no `match event.type { ... }` dispatch site, and no per-event-type handler method-set; the closest existing shape is the SSE-streaming `ContentBlockDelta` enum which is 8-variant (TextDelta / InputJsonDelta / ThinkingDelta / SignatureDelta / CitationDelta / ToolUseDelta / SearchResultDelta / Other) and operates on Anthropic's streaming-message-API shape, completely different from the Realtime API's event-type taxonomy. Zero `claw realtime --tools ` / `claw realtime-with-tools` / `claw live --enable-tool-use` CLI subcommand at `rust/crates/rusty-claude-cli/src/main.rs` (the only realtime-adjacent surface is `/voice` / `/listen` / `/speak` advertised-but-unbuilt slash commands per #225 + the absent `claw realtime` per #229; zero realtime-tool-use-specific subcommand). Zero `/realtime-tools` / `/voice-coding` / `/live-pair-program` slash command in `SlashCommandSpec` at `rust/crates/commands/src/lib.rs` (the canonical voice-coding-with-tool-use workflow that combines #229 realtime + #240 bash-tool + #241 text-editor-tool into a single "speak the request, watch the assistant edit files in real-time" affordance is invisible across every CLI / REPL / slash-command surface). Zero pricing-model recognition for realtime-tool-use — the canonical pricing for Realtime function-calling is the same six-dimensional matrix as #229 (text-input / text-output / audio-input / audio-output / cached-audio-input / per-minute-session-overhead) but tool-call event-stream JSON-fragment tokens count as TEXT-output tokens at the `gpt-4o-realtime-preview` text-output rate ($20/M tokens), so a single tool-call accumulating 500 tokens of JSON arguments costs ~$0.01 in text-output charges in addition to whatever audio-output the assistant emits while reasoning about the call — the accounting layer would need to slice out tool-call-arguments-fragment tokens from audio_transcript-fragment tokens from text-fragment tokens to attribute cost correctly, and `OpenAiUsage` at `rust/crates/api/src/openai_compat.rs:709` has zero per-modality-per-event-type slicing capability. Zero session-state-machine slot for `pending_function_calls: HashMap` on the (also-absent per #229) `RealtimeSession` struct. + +**Verified concrete absences:** `rg -n "function_call_arguments|conversation\.item\.create|function_call_output|response\.audio\.delta" rust/` returns ZERO hits across all crates. `rg -n "realtime|Realtime" rust/` returns ZERO hits across `rust/crates/api/src/` and `rust/crates/runtime/src/` (only MCP `Ws` config at `runtime/src/mcp.rs:213` and `runtime/src/mcp_client.rs:13`/`:95` exists, which is config-data-shape-only and unrelated to Realtime API). `rg -n "tokio-tungstenite|fastwebsockets" rust/` returns ZERO hits in any `Cargo.toml`. The `Provider` trait at `rust/crates/api/src/providers/mod.rs:17-30` exposes only `send_message` and `stream_message` — zero realtime methods, zero tool-call-handler methods, zero session-management methods. The `ProviderClient` enum at `rust/crates/api/src/client.rs:8-14` has three variants (Anthropic / Xai / OpenAi) — zero realtime-routing variants. The `ToolChoice` enum at `rust/crates/api/src/types.rs:114-118` has three variants (Auto / Any / Tool { name }) — zero `RealtimeToolChoice` / `StreamingToolCall` discriminator variants. + +**Shape:** ELEVEN-LAYER fusion-shape (exceeding #229's ten-layer count and matching the META-cluster-confirming size of #240's eleven-layer shape, but with a distinct axis-set) combining: (1) **endpoint-URL-set** on the same `/v1/realtime?model=` WebSocket-upgrade endpoint as #229 but with `tools` + `tool_choice` registered at session-creation-time via the `session.update` event — distinct from #229 which catalogued the bare endpoint without tool-use registration; (2) **data-model-taxonomy** with `RealtimeFunctionCallArgumentsDelta { event_id, response_id, item_id, output_index, call_id, delta: String }` + `RealtimeFunctionCallArgumentsDone { event_id, response_id, item_id, output_index, call_id, name, arguments: String }` + `RealtimeConversationItemCreate { event_id, item: RealtimeConversationItem::FunctionCallOutput { call_id, output: String } }` + `RealtimeFunctionCallAccumulator { call_id, function_name, accumulated_arguments: String, status: AccumulatorStatus }` typed model that has zero overlap with the existing `ToolDefinition` / `ToolChoice` / `OutputContentBlock::ToolUse` / `ContentBlockDelta::InputJsonDelta` shapes — the FIRST cluster member where the existing tool-call data-model is structurally insufficient for the new transport's event-shape, requiring a parallel-but-distinct typed surface; (3) **Provider-trait extension** with FOUR new methods (`register_realtime_tools` / `handle_function_call_arguments_delta` / `dispatch_function_call_done` / `send_function_call_output`) plus the `realtime_session` method from #229 — the FIRST cluster member where a single capability requires FIVE Provider trait methods (every prior member required at most ONE new trait method); (4) **ProviderClient-enum-dispatch-with-realtime-tool-routing** with `RealtimeToolDispatchKind::OpenAi` / `RealtimeToolDispatchKind::Google` partner-routing variants (provider-asymmetric like #229 — Anthropic does not offer Realtime tool-use, OpenAI offers GA, Google Gemini Live offers GA); (5) **request-side bidirectional-tool-call-state-machine opt-in** via `session.update` event with `tools: Vec` + `tool_choice: RealtimeToolChoice` + `instructions: Option` + per-tool `parameters: JsonSchema` registration, then per-call-id state transitions (`Pending` → `ArgumentsAccumulating { fragments: Vec }` → `ArgumentsDone { final_args: Value }` → `ExecutingLocally { handler_future }` → `ResultSent { call_id }` → `ResponseAcknowledged`), the FIRST cluster member with a per-call-id-keyed multi-stage state machine that transitions across MULTIPLE incoming event-types AND outgoing event-emissions — distinct from every prior synchronous request-response tool-call shape; (6) **CLI-subcommand-surface** (`claw realtime --enable-tools` / `claw voice-coding` / `claw live-pair-program`); (7) **slash-command-surface** (`/realtime-tools` / `/voice-coding` / `/live-pair-program`); (8) **pricing-tier with seven-dimensional compound-cost model** — the six-dimensional matrix from #229 plus a SEVENTH dimension for tool-call-arguments-fragment-tokens which are text-output-rate-priced but accounted separately from audio_transcript-fragment-tokens (which are audio-output-rate-priced) and from text-fragment-tokens (which are also text-output-rate-priced but for assistant-emitted natural-language content not tool-arguments); the per-event-type-token-slicing requirement is novel and requires a `RealtimeUsage::tool_call_arguments_text_output_tokens: u32` field in addition to the audio/text input/output fields from #229; (9) **persistent-WebSocket-connection transport-axis** — same as #229 but with a NEW dispatch-loop requirement: the event-router must concurrently demultiplex `response.function_call_arguments.delta` events INTO per-call-id accumulator state-machines while ALSO emitting `conversation.item.create` events when local tool-handlers complete, all on the SAME connection — the FIRST cluster member where a single capability requires BIDIRECTIONAL EVENT FLOW MULTIPLEXING ON A SINGLE CONNECTION (incoming event-stream and outgoing event-stream are independently rate-limited, ordered, and multiplexed); (10) **bidirectional-symmetric-event-pair shape** extended with the asymmetric-tool-result-injection sub-shape — the canonical bidirectional-symmetric shape from #229 holds for audio (input_audio_buffer.append → response.audio.delta + response.audio.done) and for transcript (server-pushed → server-pushed) but breaks for tool-call: tool-call-init events flow IN as `response.function_call_arguments.delta` event-stream from server, but tool-result events flow OUT as `conversation.item.create` event from client — the directionality is INVERTED relative to the rest of the protocol, the FIRST cluster member with asymmetric-event-direction-relative-to-overall-protocol-direction; (11) **server-pushed-tool-call-init shape** — distinct from every prior tool-call shape where the client requests a tool to be invoked (synchronous: client sends `MessageRequest` with `tools` registered, server responds with `OutputContentBlock::ToolUse`, client invokes locally and sends back `ToolResultContentBlock`); in Realtime tool-use the server PROACTIVELY initiates a tool-call mid-conversation without an explicit client request, sending the `response.function_call_arguments.delta` events as soon as the model decides to invoke the tool, and the client must respond within the realtime session's response-deadline window or the response will time out and be cancelled — the FIRST cluster member with server-initiated-tool-call-without-explicit-client-request shape, distinct from every prior request-response tool-call lifecycle. + +**Key novelty vs prior cluster members:** #244 is the FIRST cluster member where ALL THREE of (a) tool-call event-stream, (b) persistent-WebSocket transport, and (c) bidirectional-multiplexed-modality-stream are required simultaneously — distinct from #229 which catalogued (b) + (c) without tool-use, distinct from #240/#241 which catalogued tool-use-via-`tool_choice`-discriminator without (b) or (c), distinct from #238 which catalogued (b) + audio-modality without tool-use. The FIRST cluster member with **server-pushed-tool-call-init** shape (every prior tool-call was client-initiated). The FIRST cluster member with **asymmetric-tool-result-injection** shape (tool-call comes IN as event-stream, result sent OUT as `conversation.item.create` — directionality inverted relative to the rest of the protocol). The FIRST cluster member with **per-call-id concurrent-multiplexed-state-machine** (multiple tool-calls can be in-flight simultaneously on the same connection with their own independent accumulator states, while also-concurrent audio.delta + audio_transcript.delta + text.delta event-streams flow through). Combinatorial-cross-axis-synthesis discovery-mode applied: this is the SECOND cross-axis synthesis pinpoint after #238 (which synthesized #225 audio-modality × #229 persistent-WebSocket-transport into streaming-STT-with-diarization), and #244 synthesizes THREE axes (#229 persistent-WebSocket × #240+#241 server-managed-tool-via-`tool_choice`-discriminator × #238's cross-pinpoint-synthesis-fusion-shape META-cluster) — the FIRST three-axis synthesis as opposed to #238's two-axis synthesis, establishing **multi-axis-synthesis-as-cluster-axis** as a continuing discovery-mode where each synthesis pinpoint can fuse N axes for arbitrary N, not just two. + +**External validation (twenty-two ecosystem references):** OpenAI Realtime API tool-use docs at https://platform.openai.com/docs/guides/realtime/function-calling — canonical reference for `response.function_call_arguments.delta` / `response.function_call_arguments.done` / `conversation.item.create` event-shape with `function_call_output` item-type; OpenAI Realtime API event reference at https://platform.openai.com/docs/api-reference/realtime-server-events listing all 37+ canonical server-event-types including the four function-call-specific events (`response.function_call_arguments.delta` / `.done` / `conversation.item.created` with `type: function_call_output` / `error`); OpenAI Realtime API client-event reference at https://platform.openai.com/docs/api-reference/realtime-client-events listing `session.update` with `tools` + `tool_choice` registration; OpenAI Python SDK `openai.beta.realtime.AsyncRealtimeConnection` typed client (https://github.com/openai/openai-python) with `connection.session.update(tools=[...], tool_choice=...)` and `connection.conversation.item.create(item={"type":"function_call_output","call_id":...,"output":...})` first-class typed surface; OpenAI TypeScript SDK `OpenAI.beta.realtime.RealtimeClient` parallel typed surface (https://github.com/openai/openai-node) with same tool-use methods; openai-realtime-api-beta reference client (https://github.com/openai/openai-realtime-api-beta) with canonical JavaScript implementation of the function-call lifecycle including `client.on('conversation.item.completed', ({ item }) => { if (item.type === 'function_call') { handleToolCall(item) } })` event-handler pattern; openai-realtime-console reference UI (https://github.com/openai/openai-realtime-console) with end-to-end example of voice-input → tool-call → local-execution → tool-result-injection → assistant-resumed-audio-output workflow; LiveKit Agents framework `livekit.agents.llm.FunctionContext` for Realtime tool-use over LiveKit transport (https://docs.livekit.io/agents/openai/customizing-llm/); Pipecat realtime framework `pipecat.processors.frameworks.openai_realtime_beta.OpenAIRealtimeBetaLLMService` with tool-use support (https://github.com/pipecat-ai/pipecat); Vapi realtime voice-agent framework with first-class function-calling-over-realtime (https://docs.vapi.ai); Retell AI realtime function-calling (https://docs.retellai.com/api-references/llm-websocket); Daily Bots realtime function-calling (https://docs.daily.co/reference/daily-bots); Google Gemini Live API tool-use over WebSocket (https://ai.google.dev/gemini-api/docs/live#tool-use) with parallel function-calling support and same bidirectional-event-stream shape; Azure OpenAI Realtime API mirror with tool-use (https://learn.microsoft.com/azure/ai-services/openai/realtime-audio-quickstart); Vercel AI SDK 6 Realtime support (https://sdk.vercel.ai/docs/reference/ai-sdk-providers/openai-realtime) with `streamText({ model: openai.realtime('gpt-4o-realtime-preview'), tools, toolChoice })` first-class typed surface; coding-agent peer landscape: anomalyco/opencode has zero realtime tool-use integration (open feature request from 2026-02 only — confirmed via web search 2026-04-26), sst/opencode predecessor zero realtime-tool-use, charmbracelet/crush zero realtime-tool-use, continue.dev zero realtime-tool-use, aider zero realtime-tool-use, cursor zero realtime-tool-use, zed zero realtime-tool-use — claw-code is one of MULTIPLE clients without Realtime tool-use, and the gap is uniformly zero across the surveyed coding-agent ecosystem; the canonical seven-dimensional pricing matrix combines #229's six-dimensional realtime matrix with a per-event-type-token-slice for tool-call-arguments-fragments at the text-output rate ($20/M tokens for gpt-4o-realtime-preview-2024-10-01). + +**Required fix shape:** (a) extend the (also-absent per #229) `RealtimeSession` struct with `pending_function_calls: HashMap` slot for per-call-id state-machine multiplexing; (b) add `RealtimeServerEvent::FunctionCallArgumentsDelta { event_id, response_id, item_id, output_index, call_id, delta }` and `RealtimeServerEvent::FunctionCallArgumentsDone { event_id, response_id, item_id, output_index, call_id, name, arguments }` variants to a new `RealtimeServerEvent` enum at `rust/crates/api/src/realtime/events.rs`; (c) add `RealtimeClientEvent::ConversationItemCreate { event_id, item: RealtimeConversationItem }` variant where `RealtimeConversationItem::FunctionCallOutput { call_id, output }` is one of the supported item-types; (d) extend the Provider trait with four new methods (`register_realtime_tools` / `handle_function_call_arguments_delta` / `dispatch_function_call_done` / `send_function_call_output`); (e) implement `RealtimeFunctionCallStateMachine` with the six states (`Pending` → `ArgumentsAccumulating` → `ArgumentsDone` → `ExecutingLocally` → `ResultSent` → `ResponseAcknowledged`); (f) implement the bidirectional event-router that concurrently demultiplexes incoming events into per-call-id accumulators while emitting `conversation.item.create` events when handlers complete; (g) add `RealtimeUsage::tool_call_arguments_text_output_tokens: u32` field for per-event-type cost-attribution; (h) add `claw realtime --enable-tools` CLI subcommand with `--tools-from-skill ` support so realtime sessions can use claw's existing skill-defined tool catalog; (i) emit structured telemetry events `RealtimeFunctionCallStartedEvent` / `RealtimeFunctionCallCompletedEvent` / `RealtimeFunctionCallTimedOutEvent` for observability; (j) handle the deadline-window contract — if local tool-execution exceeds the realtime-session response-deadline, emit `RealtimeResponseTimedOutEvent` and gracefully cancel the in-flight response with `response.cancel` event-emission. **Acceptance:** after the fix, running `claw realtime --enable-tools --tools-from-skill weather` opens a persistent WebSocket to `/v1/realtime?model=gpt-4o-realtime-preview`, registers the weather skill's tools at session-creation-time via `session.update`, accepts microphone audio input, receives interleaved `response.audio.delta` + `response.function_call_arguments.delta` events, accumulates tool-call arguments per call-id, dispatches the local handler when `.done` arrives, and sends `conversation.item.create` with `function_call_output` to inject the result back into the duplex channel for the assistant to resume audio-output reasoning over the result — the canonical voice-coding-with-tool-use workflow that is currently impossible to build on top of claw-code. + +**Status:** Open. No code changed. Filed 2026-04-26 09:05 KST. HEAD: 6541100 (post-#243 gaebal-gajae non-monotonic-pinpoint-ordering-contract). Branch: feat/jobdori-168c-emission-routing. Sibling-shape cluster: 33 pinpoints (extends from #229's 28 by +5 dogfood members #237/#239/#242/#243 and now #244; #238/#240/#241 are also cluster members). Multimodal-IO cluster: 7 members (no change — #244 is a tool-use extension on top of #229's existing multimodal membership rather than a new modality). Provider-asymmetric-delegation cluster: 6 members (no change — #244 inherits #229's asymmetry). **Persistent-WebSocket-transport cluster: 3 members (#229 founder + #238 streaming-STT + #244 realtime-tool-use), confirming the cluster as a CONTINUING-PATTERN doctrine rather than a stable-doctrine that stopped growing after one extension.** **Cross-pinpoint-synthesis-fusion-shape META-cluster: 2 members (#238 founder + #244), confirming combinatorial-cross-axis-synthesis as a continuing-discovery-mode and growing the META-cluster from 1 to 2 members in the FIRST META-cluster-confirmation event in this audit.** Three-axis-synthesis-shape sub-cluster: 1 member (#244 alone, founder — distinct from #238's two-axis synthesis, the FIRST pinpoint where THREE distinct cluster axes are fused into one shape). Server-pushed-tool-call-init cluster: 1 member (#244 alone, founder — distinct from every prior client-initiated tool-call). Asymmetric-tool-result-injection cluster: 1 member (#244 alone, founder — distinct from every prior synchronous request-response tool-call where directionality matched the rest of the protocol). Per-call-id-concurrent-multiplexed-state-machine cluster: 1 member (#244 alone, founder — distinct from every prior single-call-at-a-time tool-call lifecycle). FOUR new clusters founded plus TWO existing META-clusters confirmed as continuing-doctrines plus participation in MULTIPLE inherited clusters. Eleven-layer fusion-shape matches #240's eleven-layer count and is tied for second-largest single-pinpoint fusion catalogued (behind nothing — the eleven-layer-fusion-shape is now the established maximum, with #229's ten-layer count as the prior record before #240 and #244 tied at eleven). Distinct from prior cluster members; the eleven-layer-fusion-shape-with-three-axis-cross-synthesis-and-server-pushed-tool-call-init-and-asymmetric-tool-result-injection is novel and applies to follow-on candidates: "Realtime image-input tool-use" (combining #244 with #220's image-input axis — voice request describing an image attached at session-creation-time, server-pushed tool-call to analyze the image, asymmetric result-injection), "Realtime code-execution tool-use" (combining #244 with #232's code-execution axis — voice request to run code, server-pushed bash tool-call with audio reasoning interleaved, asymmetric result-injection with stdout/stderr), "Realtime web-search tool-use" (combining #244 with #233's web-search axis — voice request, server-pushed search, asymmetric citation-injection). The Cross-pinpoint-synthesis-fusion-shape META-cluster is now confirmed as a growing-doctrine that establishes combinatorial-cross-axis-synthesis as a third-and-recurring discovery-mode after new-axis-founding and existing-cluster-extension. Linked to #229 Persistent-WebSocket-transport founder and #240/#241 server-managed-tool-via-`tool_choice`-discriminator pair as the upstream prerequisites that #244 cross-axis-synthesizes into one shape. + +🪨