From 7d90283cf9e5f471336b718c1c32cc1272ced873 Mon Sep 17 00:00:00 2001 From: YeonGyu-Kim Date: Wed, 8 Apr 2026 15:03:30 +0900 Subject: [PATCH] docs(roadmap): record cascade-masking pinpoint under green-ness contract (#9) Concrete follow-up captured from today's dogfood session: A single hung test (oversized-request preflight, 6 minutes per attempt after `be561bf` silently swallowed count_tokens errors) crashed the `cargo test --workspace` job before downstream crates could run, hiding 6 separate pre-existing CLI regressions until `8c6dfe5` + `5851f2d` restored the fast-fail path. Two new acceptance criteria for #9: - per-test timeouts in CI so one hang cannot mask other failures - distinguish `test.hung` from generic test failures in worker reports --- ROADMAP.md | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/ROADMAP.md b/ROADMAP.md index d27e3fa..fd7aa88 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -194,6 +194,20 @@ Workers should distinguish: Acceptance: - no more ambiguous "tests passed" messaging - merge policy can require the correct green level for the lane type +- a single hung test must not mask other failures: enforce per-test + timeouts in CI (`cargo test --workspace`) so a 6-minute hang in one + crate cannot prevent downstream crates from running their suites +- when a CI job fails because of a hang, the worker must report it as + `test.hung` rather than a generic failure, so triage doesn't conflate + it with a normal `assertion failed` +- recorded pinpoint (2026-04-08): `be561bf` swapped the local + byte-estimate preflight for a `count_tokens` round-trip and silently + returned `Ok(())` on any error, so `send_message_blocks_oversized_*` + hung for ~6 minutes per attempt; the resulting workspace job crash + hid 6 *separate* pre-existing CLI regressions (compact flag + discarded, piped stdin vs permission prompter, legacy session layout, + help/prompt assertions, mock harness count) that only became + diagnosable after `8c6dfe5` + `5851f2d` restored the fast-fail path ## Phase 4 — Claws-First Task Execution