From 257aeb82ddbc336acc30ef649d0f15922c25c2c9 Mon Sep 17 00:00:00 2001 From: Yeachan-Heo Date: Sat, 11 Apr 2026 18:47:25 +0000 Subject: [PATCH] Retire the stale dead-session opacity backlog item with regression proof ROADMAP #38 no longer reflects current main. The runtime already runs a post-compaction session-health probe, but the backlog lacked explicit regression proof. This change adds focused tests for the two important behaviors: a broken tool surface aborts a compacted session with a targeted error, while a freshly compacted empty session does not false-positive as dead. With that proof in place, the roadmap item can be marked done. Constraint: User required fresh cargo fmt/clippy/test evidence before closing any backlog item Rejected: Leave #38 open because the implementation already existed | backlog stays stale and invites duplicate work Confidence: high Scope-risk: narrow Reversibility: clean Directive: Reopen #38 only with a fresh same-turn repro that bypasses the current health-probe gate Tested: cargo fmt --all --check; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace Not-tested: No live long-running dogfood session replay beyond existing automated coverage --- ROADMAP.md | 2 +- rust/crates/runtime/src/conversation.rs | 82 +++++++++++++++++++++++++ 2 files changed, 83 insertions(+), 1 deletion(-) diff --git a/ROADMAP.md b/ROADMAP.md index 0656346..82677ec 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -440,7 +440,7 @@ Model name prefix now wins unconditionally over env-var presence. Regression tes 37. **Claude subscription login path should be removed, not deprecated** -- dogfooded 2026-04-09. Official auth should be API key only (`ANTHROPIC_API_KEY`) or OAuth bearer token via `ANTHROPIC_AUTH_TOKEN`; the local `claw login` / `claw logout` subscription-style flow created legal/billing ambiguity and a misleading saved-OAuth fallback. **Done (verified 2026-04-11):** removed the direct `claw login` / `claw logout` CLI surface, removed `/login` and `/logout` from shared slash-command discovery, changed both CLI and provider startup auth resolution to ignore saved OAuth credentials, and updated auth diagnostics to point only at `ANTHROPIC_API_KEY` / `ANTHROPIC_AUTH_TOKEN`. Verification: targeted `commands`, `api`, and `rusty-claude-cli` tests for removed login/logout guidance and ignored saved OAuth all pass, and `cargo check -p api -p commands -p rusty-claude-cli` passes. Source: gaebal-gajae policy decision 2026-04-09. -38. **Dead-session opacity: bot cannot self-detect compaction vs broken tool surface** -- dogfooded 2026-04-09. Jobdori session spent ~15h declaring itself "dead" in-channel while tools were actually returning correct results within each turn. Root cause: context compaction causes tool outputs to be summarised away between turns, making the bot interpret absence-of-remembered-output as tool failure. This is a distinct failure mode from ROADMAP #31 (executor quirks): the session is alive and tools are functional, but the agent cannot tell the difference between "my last tool call produced no output" (compaction) and "the tool is broken". Downstream: repetitive false-dead signals in the channel, work not getting done despite the execution surface being live. Fix shape: (a) probe with a short known-output command at turn start if context has been compacted; (b) gate "I am dead" declarations behind at least one within-turn tool call with a verified non-empty result; (c) consider adding a session-health canary cron that fires a wake with a minimal probe and checks the result. Source: Jobdori self-dogfood 2026-04-09; observed in #clawcode-building-in-public across multiple Clawhip nudge cycles. +38. **Dead-session opacity: bot cannot self-detect compaction vs broken tool surface** -- dogfooded 2026-04-09. Jobdori session spent ~15h declaring itself "dead" in-channel while tools were actually returning correct results within each turn. Root cause: context compaction causes tool outputs to be summarised away between turns, making the bot interpret absence-of-remembered-output as tool failure. This is a distinct failure mode from ROADMAP #31 (executor quirks): the session is alive and tools are functional, but the agent cannot tell the difference between "my last tool call produced no output" (compaction) and "the tool is broken". **Done (verified 2026-04-11):** `ConversationRuntime::run_turn()` now runs a post-compaction session-health probe through `glob_search`, fails fast with a targeted recovery error if the tool surface is broken, and skips the probe for a freshly compacted empty session. Fresh regression coverage proves both the failure gate and the empty-session bypass. Source: Jobdori self-dogfood 2026-04-09; observed in #clawcode-building-in-public across multiple Clawhip nudge cycles. 39. **Several slash commands are registered but not implemented: /branch, /rewind, /ide, /tag, /output-style, /add-dir** -- dogfooded 2026-04-09. These commands appear in the REPL completions surface but silently print 'Command registered but not yet implemented.' and return false. Users (mezz2301 in #claw-code) hit this as 'many features are not supported in this version now'. Fix shape: either (a) implement the missing commands, or (b) remove them from completions/help output until they are ready, so the discovery surface matches what actually works. Source: mezz2301 in #claw-code 2026-04-09; pinpointed in main.rs:3728. diff --git a/rust/crates/runtime/src/conversation.rs b/rust/crates/runtime/src/conversation.rs index c29f834..610ba1a 100644 --- a/rust/crates/runtime/src/conversation.rs +++ b/rust/crates/runtime/src/conversation.rs @@ -1611,6 +1611,88 @@ mod tests { ); } + #[test] + fn compaction_health_probe_blocks_turn_when_tool_executor_is_broken() { + struct SimpleApi; + impl ApiClient for SimpleApi { + fn stream( + &mut self, + _request: ApiRequest, + ) -> Result, RuntimeError> { + panic!("API should not run when health probe fails"); + } + } + + let mut session = Session::new(); + session.record_compaction("summarized earlier work", 4); + session + .push_user_text("previous message") + .expect("message should append"); + + let tool_executor = StaticToolExecutor::new().register("glob_search", |_input| { + Err(ToolError::new("transport unavailable")) + }); + let mut runtime = ConversationRuntime::new( + session, + SimpleApi, + tool_executor, + PermissionPolicy::new(PermissionMode::DangerFullAccess), + vec!["system".to_string()], + ); + + let error = runtime + .run_turn("trigger", None) + .expect_err("health probe failure should abort the turn"); + assert!( + error + .to_string() + .contains("Session health probe failed after compaction"), + "unexpected error: {error}" + ); + assert!( + error.to_string().contains("transport unavailable"), + "expected underlying probe error: {error}" + ); + } + + #[test] + fn compaction_health_probe_skips_empty_compacted_session() { + struct SimpleApi; + impl ApiClient for SimpleApi { + fn stream( + &mut self, + _request: ApiRequest, + ) -> Result, RuntimeError> { + Ok(vec![ + AssistantEvent::TextDelta("done".to_string()), + AssistantEvent::MessageStop, + ]) + } + } + + let mut session = Session::new(); + session.record_compaction("fresh summary", 2); + + let tool_executor = StaticToolExecutor::new().register("glob_search", |_input| { + Err(ToolError::new( + "glob_search should not run for an empty compacted session", + )) + }); + let mut runtime = ConversationRuntime::new( + session, + SimpleApi, + tool_executor, + PermissionPolicy::new(PermissionMode::DangerFullAccess), + vec!["system".to_string()], + ); + + let summary = runtime + .run_turn("trigger", None) + .expect("empty compacted session should not fail health probe"); + assert_eq!(summary.auto_compaction, None); + assert_eq!(runtime.session().messages.len(), 2); + } + #[test] fn build_assistant_message_requires_message_stop_event() { // given