The CLI entry point (build_runtime_with_plugin_state in main.rs)
was hardcoded to always instantiate AnthropicRuntimeClient with an
AnthropicClient, regardless of what detect_provider_kind(model)
returned. This meant `--model openai/gpt-4` with OPENAI_API_KEY
set and no ANTHROPIC_* vars still failed with "missing Anthropic
credentials" because the CLI never dispatched to the OpenAI-compat
backend that already exists in the api crate.
Root cause: AnthropicRuntimeClient.client was typed as
AnthropicClient (concrete) rather than ApiProviderClient (enum).
The api crate already had a ProviderClient enum with Anthropic /
Xai / OpenAi variants that dispatches correctly via
detect_provider_kind, plus a unified MessageStream enum that wraps
both anthropic::MessageStream and openai_compat::MessageStream
with the same next_event() -> StreamEvent interface. The CLI just
wasn't using it.
Changes (1 file, +59 -7):
- Import api::ProviderClient as ApiProviderClient
- Change AnthropicRuntimeClient.client from AnthropicClient to
ApiProviderClient
- In AnthropicRuntimeClient::new(), dispatch based on
detect_provider_kind(&resolved_model):
* Anthropic: build AnthropicClient directly with
resolve_cli_auth_source() + api::read_base_url() +
PromptCache (preserves ANTHROPIC_BASE_URL override for mock
test harness and the session-scoped prompt cache)
* xAI / OpenAi: delegate to
ApiProviderClient::from_model_with_anthropic_auth which routes
to OpenAiCompatClient::from_env with the matching config
(reads OPENAI_API_KEY/XAI_API_KEY/DASHSCOPE_API_KEY and their
BASE_URL overrides internally)
- Change push_prompt_cache_record to take &ApiProviderClient
(ProviderClient::take_last_prompt_cache_record returns None for
non-Anthropic variants, so the helper is a no-op on
OpenAI-compat providers without extra branching)
What this unlocks for users:
claw --model openai/gpt-4.1-mini prompt 'hello' # OpenAI
claw --model grok-3 prompt 'hello' # xAI
claw --model qwen-plus prompt 'hello' # DashScope
OPENAI_BASE_URL=https://openrouter.ai/api/v1 \
claw --model openai/anthropic/claude-sonnet-4 prompt 'hello' # OpenRouter
All previously broken, now routed correctly by prefix.
Verification:
- cargo build --release -p rusty-claude-cli: clean
- cargo test --release -p rusty-claude-cli: 182 tests, 0 failures
(including compact_output tests that exercise the Anthropic mock)
- cargo fmt --all: clean
- cargo clippy --workspace: warnings-only (pre-existing)
- cargo test --release --workspace: all crates green except one
pre-existing race in runtime::config::tests (passes in isolation)
Source: live users nicma (1491342350960562277) and Jengro
(1491345009021030533) in #claw-code on 2026-04-08.
#28 error-copy improvements landed on ff1df4c but real users (nicma,
Jengro) hit `error: missing Anthropic credentials` within hours when
using `--model openai/gpt-4` with OPENAI_API_KEY set and all
ANTHROPIC_* env vars unset on main.
Traced root cause in build_runtime_with_plugin_state at line ~6244:
AnthropicRuntimeClient::new() is hardcoded. BuiltRuntime is
statically typed as ConversationRuntime<AnthropicRuntimeClient, ...>.
providers::detect_provider_kind() computes the right routing at the
metadata layer but the runtime client is never dispatched.
Files #29 with the detailed trace + a focused action plan:
DynamicApiClient enum wrapping Anthropic + OpenAiCompat variants,
retype BuiltRuntime, dispatch in build_runtime based on
detect_provider_kind, integration test with mock OpenAI-compat
server.
#28 is marked partial — the error-copy improvements are real and
stayed in, but the routing gap they were meant to cover is the
actual bug and needs #29 to land.
Two live users in #claw-code on 2026-04-08 hit adjacent auth confusion:
varleg set OPENAI_API_KEY for OpenRouter but prefix routing didn't
activate without openai/ model prefix, and stanley078852 put sk-ant-*
in ANTHROPIC_AUTH_TOKEN (Bearer path) instead of ANTHROPIC_API_KEY
(x-api-key path) and got 401 Invalid bearer token.
Changes:
1. ApiError::MissingCredentials gained optional hint field (error.rs)
2. anthropic_missing_credentials_hint() sniffs OPENAI/XAI/DASHSCOPE
env vars and suggests prefix routing when present (providers/mod.rs)
3. All 4 Anthropic auth paths wire the hint helper (anthropic.rs)
4. 401 + sk-ant-* in bearer token detected and hint appended
5. 'Which env var goes where' section added to USAGE.md
Tests: unit tests for all three improvements (no HTTP calls needed).
Workspace: all tests green, fmt clean, clippy warnings-only.
Source: live users varleg + stanley078852 in #claw-code 2026-04-08.
Co-authored-by: gaebal-gajae <gaebal-gajae@layofflabs.com>
Filed from live #claw-code dogfood on 2026-04-08 where two real users
hit adjacent auth confusion within minutes:
- varleg set OPENAI_API_KEY for OpenRouter but prefix routing didn't
win because the model name wasn't prefixed with openai/; unsetting
ANTHROPIC_API_KEY then hit MissingApiKey with no hint that the
OpenAI path was already configured
- stanley078852 put an sk-ant-* key in ANTHROPIC_AUTH_TOKEN instead
of ANTHROPIC_API_KEY, causing claw to send it as
Authorization: Bearer sk-ant-..., which Anthropic rejects at the
edge with 401 Invalid bearer token
Both fixes delivered live in #claw-code as direct replies, but the
pattern is structural: the error surface doesn't bridge HTTP-layer
symptoms back to env-var choice.
Action block spells out a single main-side PR with three
improvements: (a) MissingCredentials hint when an adjacent
provider's env var is already set, (b) 401-on-Anthropic hint when
bearer token starts with sk-ant-, (c) 'which env var goes where'
paragraph in both README matrices mapping sk-ant-* -> x-api-key and
OAuth access token -> Authorization: Bearer.
All three improvements are unit-testable against ApiError::fmt
output with no HTTP calls required.
The original ROADMAP #25 entry claimed the root cause was missing
exec bits on generated hook scripts. That was wrong — a chmod-only
fix (4f7b674) still failed CI. The actual bug was output_with_stdin
unconditionally propagating BrokenPipe from write_all when the child
exits before the parent finishes writing stdin.
Updated per gaebal-gajae's direction: actual fix, hygiene hardening,
and regression guard are now clearly separated. Added a meta-lesson
about Broken pipe ambiguity in fork/exec paths so future investigators
don't cargo-cult the same wrong first theory.
Two bugs found in the plugin hook test harness that together caused
Linux CI to fail on 'hooks::tests::collects_and_runs_hooks_from_enabled_plugins'
with 'Broken pipe (os error 32)'. Three reproductions plus one rerun
failure on main today: 24120271422, 24120538408, 24121392171.
Root cause 1 (chmod, defense-in-depth): write_hook_plugin writes
pre.sh/post.sh/failure.sh via fs::write without setting the execute
bit. While the runtime hook runner invokes hooks via 'sh <path>' (so
the script file does not strictly need +x), missing exec perms can
cause subtle fork/exec races on Linux in edge cases.
Root cause 2 (the actual CI failure): output_with_stdin unconditionally
propagated write_all errors on the child's stdin pipe, including
BrokenPipe. A hook script that runs to completion in microseconds
(e.g. a one-line printf) can exit and close its stdin before the parent
finishes writing the JSON payload. Linux pipes surface this as EPIPE
immediately; macOS pipes happen to buffer the small payload, so the
race only shows on ubuntu CI runners. The parent's write_all raised
BrokenPipe, which output_with_stdin returned as Err, which run_command
classified as 'failed to start', making the test assertion fail.
Fix: (a) make_executable helper sets mode 0o755 via PermissionsExt on
each generated hook script, with a #[cfg(unix)] gate and a no-op
#[cfg(not(unix))] branch. (b) output_with_stdin now matches the
write_all result and swallows BrokenPipe specifically (the child still
ran; wait_with_output still captures stdout/stderr/exit code), while
propagating all other write errors. (c) New regression guard
generated_hook_scripts_are_executable under #[cfg(unix)] asserts each
generated .sh file has at least one execute bit set.
Surgical scope per gaebal-gajae's direction: chmod + pipe tolerance +
regression guard only. The deeper plugin-test sealing pass for ROADMAP
#25 + #27 stays in gaebal-gajae's OMX lane.
Verification:
- cargo test --release -p plugins → 35 passing, 0 failing
- cargo fmt -p plugins → clean
- cargo clippy -p plugins -- -D warnings → clean
Co-authored-by: gaebal-gajae <gaebal-gajae@layofflabs.com>
Filing per gaebal-gajae's status summary at message 1491322807026454579
in #clawcode-building-in-public, with corrected scope after re-running
`cargo test -p rusty-claude-cli` against main HEAD (79da4b8): the 11
deterministic failures only reproduce on dev/rust, not main, so this is
a dev/rust catchup item rather than a main regression.
Two-layered root cause documented:
1. dev/rust `parse_args` eagerly validates user plugin hook scripts
exist on disk before returning a CliAction
2. dev/rust test harness does not redirect $HOME/XDG_CONFIG_HOME to a
fixture (no `env_lock` equivalent — main has 30+ env_lock hits, dev
has zero)
Together they make dev/rust `cargo test -p rusty-claude-cli` fail on
any clean clone whose owner has a half-installed user plugin in
~/.claude/plugins/installed/. main has both the env_lock test isolation
AND the parse_args/hook-validation decoupling already; dev/rust is just
behind on the merge train.
Action block in #27 spells out backporting env_lock + the parse_args
decoupling so the next dev/rust release picks this up.
Linux CI keeps tripping over
`plugins::hooks::tests::collects_and_runs_hooks_from_enabled_plugins`
with `Broken pipe (os error 32)` when the hook runner tries to spawn a
child shell script that was written by `write_hook_plugin` without the
execute bit set. Fails on first attempt, passes on rerun (observed in CI
runs 24120271422 and 24120538408). Passes consistently on macOS.
Since issues are disabled on the repo, recording as ROADMAP backlog
item #25 in the Immediate Backlog P2 cluster next to the related plugin
lifecycle flake at #24. Action block spells out the chmod +755 fix in
`write_hook_plugin` plus the regression guard.
Concrete follow-up captured from today's dogfood session:
A single hung test (oversized-request preflight, 6 minutes per attempt
after `be561bf` silently swallowed count_tokens errors) crashed the
`cargo test --workspace` job before downstream crates could run, hiding
6 separate pre-existing CLI regressions until `8c6dfe5` + `5851f2d`
restored the fast-fail path.
Two new acceptance criteria for #9:
- per-test timeouts in CI so one hang cannot mask other failures
- distinguish `test.hung` from generic test failures in worker reports
- compact flag: was parsed then discarded (`compact: _`) instead of
passed to `run_turn_with_output` — hardcoded `false` meant --compact
never took effect
- piped stdin vs permission prompter: `read_piped_stdin()` consumed all
stdin before `CliPermissionPrompter::decide()` could read interactive
approval answers; now only consumes stdin as prompt context when
permission mode is `DangerFullAccess` (fully unattended)
- session resolver: `resolve_managed_session_path` and
`list_managed_sessions` now fall back to the pre-isolation flat
`.claw/sessions/` layout so legacy sessions remain accessible
- help assertion: match on stable prefix after `/session delete` was
added in batch 5
- prompt shorthand: fix copy-paste that changed expected prompt from
"help me debug" to "$help overview"
- mock parity harness: filter captured requests to `/v1/messages` path
only, excluding count_tokens preflight calls added by `be561bf`
All 6 failures were pre-existing but masked because `client_integration`
always failed first (fixed in 8c6dfe5).
Workspace: 810+ tests passing, 0 failing.
CI has been red since be561bf ('Use Anthropic count tokens for preflight')
because that commit replaced the free-function preflight_message_request
(byte-estimate guard) with an instance method that silently returns Ok on
any count_tokens failure:
let counted_input_tokens = match self.count_tokens(request).await {
Ok(count) => count,
Err(_) => return Ok(()), // <-- silent bypass
};
Two consequences:
1. client_integration::send_message_blocks_oversized_requests_before_the_http_call
has been FAILING on every CI run since be561bf. The mock server in that
test only has one HTTP response queued (a bare '{}' to satisfy the main
request), so the count_tokens POST parses into an empty body that fails
to deserialize into CountTokensResponse -> Err -> silent bypass -> the
oversized 600k-char request proceeds to the mock instead of being
rejected with ContextWindowExceeded as the test expects.
2. In production, any third-party Anthropic-compatible gateway that doesn't
implement /v1/messages/count_tokens (OpenRouter, Cloudflare AI Gateway,
etc.) would silently disable the preflight guard entirely, letting
oversized requests hit the upstream only to fail there with a provider-
side context-window error. This is exactly the 'opaque failure surface'
ROADMAP #22 asked us to avoid.
Fix: call the free-function super::preflight_message_request(request)? as
the first step in the instance method, before any network round-trip. This
guarantees the byte-estimate guard always fires, whether or not the remote
count_tokens endpoint is reachable. The count_tokens refinement still runs
afterward when available for more precise token counting, but it is now
strictly additive — it can only catch more cases, never silently skip the
guard.
Test results:
- cargo test -p api --lib: 89 passed, 0 failed
- cargo test --release -p api (all test binaries): 118 passed, 0 failed
- cargo test --release -p api --test client_integration \
send_message_blocks_oversized_requests_before_the_http_call: passes
- cargo fmt --check: clean
This unblocks the Rust CI workflow which has been red on every push since
be561bf landed.
Document the qwen/ and qwen- prefix routing added in 3ac97e6. Users
in Discord #clawcode-get-help (web3g, Renan Klehm, matthewblott) kept
hitting ambient-credential misrouting because the docs only showed
the OPENAI_BASE_URL pattern without explaining that model-name prefix
wins over env-var presence.
Added:
- DashScope usage section with qwen/qwen-max and bare qwen-plus examples
- DashScope row in provider matrix table
- Reasoning model sanitization note (qwen-qwq, qwq-*, *-thinking)
- Explicit statement that model-name prefix wins over ambient creds
Users in Discord #clawcode-get-help (web3g) asked for Qwen 3.6 Plus via
native Alibaba DashScope API instead of OpenRouter, which has stricter
rate limits. This commit adds first-class routing for qwen/ and bare
qwen- prefixed model names.
Changes:
- DEFAULT_DASHSCOPE_BASE_URL constant: /compatible-mode/v1 endpoint
- OpenAiCompatConfig::dashscope() factory mirroring openai()/xai()
- DASHSCOPE_ENV_VARS + credential_env_vars() wiring
- metadata_for_model: qwen/ and qwen- prefix routes to DashScope with
auth_env=DASHSCOPE_API_KEY, reuses ProviderKind::OpenAi because
DashScope speaks the OpenAI REST shape
- is_reasoning_model: detect qwen-qwq, qwq-*, and *-thinking variants
so tuning params (temperature, top_p, etc.) get stripped before
payload assembly (same pattern as o1/o3/grok-3-mini)
Tests added:
- providers::tests::qwen_prefix_routes_to_dashscope_not_anthropic
- openai_compat::tests::qwen_reasoning_variants_are_detected
89 api lib tests passing, 0 failing. cargo fmt --check: clean.
Closes the user-reported gap: 'use Qwen 3.6 Plus via Alibaba API
directly, not OpenRouter' without needing OPENAI_BASE_URL override
or unsetting ANTHROPIC_API_KEY.
build_runtime_runs_plugin_lifecycle_init_and_shutdown was the only test
that set/removed ANTHROPIC_API_KEY without holding the env_lock mutex.
Under parallel workspace execution, other tests racing on the same env
var could wipe the key mid-construction, causing a flaky credential error.
Root cause: process-wide env vars are shared mutable state. All other
tests that touch ANTHROPIC_API_KEY already use env_lock(). This test
was the only holdout.
Fix: add let _guard = env_lock(); at the top of the test.
openai_compat_integration.rs and client_integration.rs had MessageRequest
constructions without the new tuning param fields (temperature, top_p,
frequency_penalty, presence_penalty, stop) added in c667d47.
Added ..Default::default() to all 4 sites. cargo fmt applied.
This was the root cause of CI red on main (E0063 compile error in
integration tests, not caught by --lib tests).
renders_help_from_shared_specs hardcoded the exact /session usage string,
which broke when /session delete was added in batch 5. Relaxed to check
for /session presence instead of exact subcommand list.
Pre-existing test brittleness (not caused by recent commits).
687 workspace lib tests passing, 0 failing.
MessageRequest now carries OpenAI-compatible tuning params (c667d47), but
the Anthropic API does not support frequency_penalty or presence_penalty,
and uses 'stop_sequences' instead of 'stop'. Without this fix, setting
these params with a Claude model would produce 400 errors.
Changes to strip_unsupported_beta_body_fields:
- Remove frequency_penalty and presence_penalty from Anthropic request body
- Convert stop → stop_sequences (only when non-empty)
- temperature and top_p are preserved (Anthropic supports both)
Tests added:
- strip_removes_openai_only_fields_and_converts_stop
- strip_does_not_add_empty_stop_sequences
87 api lib tests passing, 0 failing.
cargo check --workspace: clean.
Reasoning models reject temperature, top_p, frequency_penalty, and
presence_penalty with 400 errors. Instead of letting these flow through
and returning cryptic provider errors, strip them silently at the
request-builder boundary.
is_reasoning_model() classifies: o1*, o3*, o4*, grok-3-mini.
stop sequences are preserved (safe for all providers).
Tests added:
- reasoning_model_strips_tuning_params: o1-mini strips all 4 params, keeps stop
- grok_3_mini_is_reasoning_model: classification coverage for grok-3-mini, o1,
o3-mini, and negative cases (gpt-4o, grok-3, claude)
85 api lib tests passing, 0 failing.
MessageRequest was missing standard OpenAI-compatible generation tuning
parameters. Callers had no way to control temperature, top_p,
frequency_penalty, presence_penalty, or stop sequences.
Changes:
- Added 5 optional fields to MessageRequest (all Option, None by default)
- Wired into build_chat_completion_request: only included in payload when set
- All existing construction sites updated with ..Default::default()
- MessageRequest now derives Default for ergonomic partial construction
Tests added:
- tuning_params_included_in_payload_when_set: all 5 params flow into JSON
- tuning_params_omitted_from_payload_when_none: absent params stay absent
83 api lib tests passing, 0 failing.
cargo check --workspace: 0 warnings.
Filed: openai/ prefix model misrouting (fixed in 0530c50).
Documents root cause, fix, and the architectural lesson:
- metadata_for_model is the canonical extension point for new providers
- auth-sniffer fallback order must never override explicit model-name prefix
- regression test locked in to guard this invariant
metadata_for_model returned None for unknown models like openai/gpt-4.1-mini,
causing detect_provider_kind to fall through to auth-sniffer order. If
ANTHROPIC_API_KEY was set, the model was silently misrouted to Anthropic
and the user got a confusing 'missing Anthropic credentials' error.
Fix: add explicit prefix checks for 'openai/' and 'gpt-' in
metadata_for_model so the model name wins over env-var presence.
Regression test added: openai_namespaced_model_routes_to_openai_not_anthropic
- 'openai/gpt-4.1-mini' routes to OpenAi
- 'gpt-4o' routes to OpenAi
Reported and reproduced by gaebal-gajae against current main.
81 api lib tests passing, 0 failing.
WorkerGet had zero test coverage. WorkerAwaitReady and WorkerSendPrompt
had only one happy-path test each with no error paths.
Added 4 tests:
- worker_get_returns_worker_state: WorkerGet fetches correct worker_id/status/cwd
- worker_get_on_unknown_id_returns_error: unknown id -> 'worker not found'
- worker_await_ready_on_spawning_worker_returns_not_ready: ready=false on spawning worker
- worker_send_prompt_on_non_ready_worker_returns_error: sending prompt before ready fails
94 tool tests passing, 0 failing.
recovery_loop_state_file_reflects_transitions reads the actual state
file after each transition to verify the canonical observability surface
reflects the full stall->resolve->ready progression:
spawning (state file exists, seconds_since_update present)
-> trust_required (is_ready=false, trust_gate_cleared=false in file)
-> spawning (trust_gate_cleared=true after WorkerResolveTrust)
-> ready_for_prompt (is_ready=true after ready screen observe)
This is the end-to-end proof gaebal-gajae called for: clawhip polling
.claw/worker-state.json will see truthful state at every step of the
recovery loop, including the seconds_since_update staleness signal.
90 tool tests passing, 0 failing.
Proves the clawhip restart/recover flow that gaebal-gajae flagged:
1. stall_detect_and_resolve_trust_end_to_end
- Worker created without trusted_roots -> trust_auto_resolve=false
- WorkerObserve with trust-prompt text -> status=trust_required, gate cleared=false
- WorkerResolveTrust -> status=spawning, trust_gate_cleared=true
- WorkerObserve with ready text -> status=ready_for_prompt
Full resolve path verified end-to-end.
2. stall_detect_and_restart_recovery_end_to_end
- Worker stalls at trust_required
- WorkerRestart resets to spawning, trust_gate_cleared=false
Documents the restart-then-re-acquire-trust flow.
Note: seconds_since_update is in .claw/worker-state.json (state file),
not in the Worker tool output struct. Staleness detection via state file
is covered by emit_state_file_writes_worker_status_on_transition in
worker_boot.rs tests.
87 tool tests passing, 0 failing.
Three dead-code warnings eliminated from cargo check:
1. KNOWN_TOP_LEVEL_KEYS / DEPRECATED_TOP_LEVEL_KEYS in config.rs
- Superseded by config_validate::TOP_LEVEL_FIELDS and DEPRECATED_FIELDS
- Were out of date (missing aliases, providerFallbacks, trustedRoots)
- Removed
2. read_git_recent_commits in prompt.rs
- Private function, never called anywhere in the codebase
- Removed
3. workspace_sessions_dir in session.rs
- Public API scaffolded for session isolation (#41)
- Genuinely useful for external consumers (clawhip enumerating sessions)
- Added 2 tests: deterministic path for same CWD, different path for different CWDs
- Annotated with #[allow(dead_code)] since it is external-facing API
cargo check --workspace: 0 warnings remaining
430 runtime tests passing, 0 failing
Closes the ambiguity gaebal-gajae flagged: downstream tooling was left
guessing which integration surface to build against.
Decision: claw state + .claw/worker-state.json is the blessed contract.
HTTP endpoint not scheduled. Rationale documented:
- plugin scope constraint (can't add routes to opencode serve)
- file polling has lower latency and fewer failure modes than HTTP
- HTTP would require upstreaming to sst/opencode or a fragile sidecar
Clawhip integration contract documented:
- poll .claw/worker-state.json after WorkerCreate
- seconds_since_update > 60 in trust_required = stall signal
- WorkerResolveTrust to unblock, WorkerRestart to reset
WorkerRestart and WorkerTerminate had zero test coverage despite being
public tools in the tool spec. Also confirms one design decision worth
noting: restart resets trust_gate_cleared=false, so an allowlisted
worker that gets restarted must re-acquire trust via the normal observe
flow (by design — trust is per-session, not per-CWD).
Tests added:
- worker_terminate_sets_finished_status
- worker_restart_resets_to_spawning (verifies status=spawning,
prompt_in_flight=false, trust_gate_cleared=false)
- worker_terminate_on_unknown_id_returns_error
- worker_restart_on_unknown_id_returns_error
85 tool tests passing, 0 failing.
Previously WorkerCreate passed trusted_roots directly to spawn_worker
with no config-level default. Any batch script omitting the field
stalled all workers at TrustRequired with no recovery path.
Now run_worker_create loads RuntimeConfig from the worker CWD before
spawning and merges config.trusted_roots() with per-call overrides.
Per-call overrides still take effect; config provides the default.
Add test: worker_create_merges_config_trusted_roots_without_per_call_override
- writes .claw/settings.json with trustedRoots=[<os-temp-dir>] in a temp worktree
- calls WorkerCreate with no trusted_roots field
- asserts trust_auto_resolve=true (config roots matched the CWD)
81 tool tests passing, 0 failing.
Closes the startup-friction gap filed in ROADMAP (dd97c49).
WorkerCreate required trusted_roots on every call with no config-level
default. Any batch script that omitted the field stalled all workers at
TrustRequired with no auto-recovery path.
Changes:
- RuntimeFeatureConfig: add trusted_roots: Vec<String> field
- ConfigLoader: wire parse_optional_trusted_roots() for 'trustedRoots' key
- RuntimeConfig / RuntimeFeatureConfig: expose trusted_roots() accessor
- config_validate: add trustedRoots to TOP_LEVEL_FIELDS schema (StringArray)
- Tests: parses_trusted_roots_from_settings + trusted_roots_default_is_empty_when_unset
Callers can now set trusted_roots in .claw/settings.json:
{ "trustedRoots": ["/tmp/worktrees"] }
WorkerRegistry::spawn_worker() callers should merge config.trusted_roots()
with any per-call overrides (wiring left for follow-up).
WorkerCreate requires trusted_roots per-call; no config-level default.
Any batch that forgets the field stalls all workers at trust_required.
Root cause of several 'batch lanes not advancing' incidents.
Recommended fix: wire RuntimeConfig::trusted_roots() as default into
WorkerRegistry::spawn_worker(), with per-call overrides. Update
config_validate schema to include the new field.
Two real schema gaps found via dogfood (cargo test -p runtime):
1. aliases and providerFallbacks not in TOP_LEVEL_FIELDS
- Both are valid config keys parsed by config.rs
- Validator was rejecting them as unknown keys
- 2 tests failing: parses_user_defined_model_aliases,
parses_provider_fallbacks_chain
2. Deprecated keys were being flagged as unknown before the deprecated
check ran (unknown-key check runs first in validate_object_keys)
- Added early-exit for deprecated keys in unknown-key loop
- Keeps deprecated→warning behavior for permissionMode/enabledPlugins
which still appear in valid legacy configs
3. Config integration tests had assertions on format strings that never
matched the actual validator output (path:3: vs path: ... (line N))
- Updated assertions to check for path + line + field name as
independent substrings instead of a format that was never produced
426 tests passing, 0 failing.
Clawhip needs to distinguish a stalled trust_required worker from one
that just transitioned. Without a pre-computed staleness field it has
to compute epoch delta itself from updated_at.
seconds_since_update = now - updated_at at snapshot write time.
Clawhip threshold: > 60s in trust_required = stalled; act.
WorkerStatus is fully tracked in worker_boot.rs but was invisible to
external observers (clawhip, orchestrators) because opencode serve's
HTTP server is upstream and not ours to extend.
Solution: atomic file-based observability.
- emit_state_file() writes .claw/worker-state.json on every push_event()
call (tmp write + rename for atomicity)
- Snapshot includes: worker_id, status, is_ready, trust_gate_cleared,
prompt_in_flight, last_event, updated_at
- Add 'claw state' CLI subcommand to read and print the file
- Add regression test: emit_state_file_writes_worker_status_on_transition
verifies spawning→ready_for_prompt transition is reflected on disk
This closes the /state dogfood gap without requiring any upstream
opencode changes. Clawhip can now distinguish a truly stalled worker
(status: trust_required or running with no recent updated_at) from a
quiet-but-progressing one.
WorkerStatus state machine exists in worker_boot.rs and is exported
from runtime/src/lib.rs. But claw-code is a plugin — it cannot add
HTTP routes to opencode serve (upstream binary, not ours).
/state HTTP endpoint via axum was never implemented. Prior session
summary claiming commit 0984cca was incorrect.
Recommended path: write WorkerStatus transitions to
.claw/worker-state.json on each transition (file-based observability,
no upstream changes required). Wire WorkerRegistry::transition() to
atomic file writes + add CLI subcommand.
Add missing imports to test module:
- PromptHistoryEntry, render_prompt_history_report, parse_history_count
- parse_export_args, render_session_markdown
- summarize_tool_payload_for_markdown, short_tool_id
Fixes test compilation errors introduced by new session and export
features from batch 5/6 work.
The CLI was using a flat cwd/.claw/sessions/ path without workspace
fingerprinting, while SessionStore::from_cwd() adds a hash subdirectory.
This mismatch meant the isolation machinery existed but wasn't actually
used by the main session management codepath.
Now sessions_dir() delegates to SessionStore::from_cwd(), ensuring all
session operations use workspace-fingerprinted directories.
Remove #[cfg(test)] gate from session_control module — SessionStore
is now available at runtime, not just in tests. Export SessionStore and
add workspace_sessions_dir() helper that creates fingerprinted session
directories per workspace root.
This is the #41 kill shot: parallel opencode serve instances will use
separate session namespaces based on workspace fingerprint instead of
sharing a global ~/.local/share/opencode/ store.
The CLI already uses cwd/.claw/sessions/ (sessions_dir()), and now
SessionStore::from_cwd() adds workspace hash isolation on top.