Commit Graph

536 Commits

Author SHA1 Message Date
YeonGyu-Kim
8c6dfe57e6 fix(api): restore local preflight guard ahead of count_tokens round-trip
CI has been red since be561bf ('Use Anthropic count tokens for preflight')
because that commit replaced the free-function preflight_message_request
(byte-estimate guard) with an instance method that silently returns Ok on
any count_tokens failure:

    let counted_input_tokens = match self.count_tokens(request).await {
        Ok(count) => count,
        Err(_) => return Ok(()),  // <-- silent bypass
    };

Two consequences:

1. client_integration::send_message_blocks_oversized_requests_before_the_http_call
   has been FAILING on every CI run since be561bf. The mock server in that
   test only has one HTTP response queued (a bare '{}' to satisfy the main
   request), so the count_tokens POST parses into an empty body that fails
   to deserialize into CountTokensResponse -> Err -> silent bypass -> the
   oversized 600k-char request proceeds to the mock instead of being
   rejected with ContextWindowExceeded as the test expects.

2. In production, any third-party Anthropic-compatible gateway that doesn't
   implement /v1/messages/count_tokens (OpenRouter, Cloudflare AI Gateway,
   etc.) would silently disable the preflight guard entirely, letting
   oversized requests hit the upstream only to fail there with a provider-
   side context-window error. This is exactly the 'opaque failure surface'
   ROADMAP #22 asked us to avoid.

Fix: call the free-function super::preflight_message_request(request)? as
the first step in the instance method, before any network round-trip. This
guarantees the byte-estimate guard always fires, whether or not the remote
count_tokens endpoint is reachable. The count_tokens refinement still runs
afterward when available for more precise token counting, but it is now
strictly additive — it can only catch more cases, never silently skip the
guard.

Test results:
- cargo test -p api --lib: 89 passed, 0 failed
- cargo test --release -p api (all test binaries): 118 passed, 0 failed
- cargo test --release -p api --test client_integration \
    send_message_blocks_oversized_requests_before_the_http_call: passes
- cargo fmt --check: clean

This unblocks the Rust CI workflow which has been red on every push since
be561bf landed.
2026-04-08 14:34:38 +09:00
YeonGyu-Kim
3ac97e635e feat(api): add qwen/ prefix routing for Alibaba DashScope provider
Users in Discord #clawcode-get-help (web3g) asked for Qwen 3.6 Plus via
native Alibaba DashScope API instead of OpenRouter, which has stricter
rate limits. This commit adds first-class routing for qwen/ and bare
qwen- prefixed model names.

Changes:
- DEFAULT_DASHSCOPE_BASE_URL constant: /compatible-mode/v1 endpoint
- OpenAiCompatConfig::dashscope() factory mirroring openai()/xai()
- DASHSCOPE_ENV_VARS + credential_env_vars() wiring
- metadata_for_model: qwen/ and qwen- prefix routes to DashScope with
  auth_env=DASHSCOPE_API_KEY, reuses ProviderKind::OpenAi because
  DashScope speaks the OpenAI REST shape
- is_reasoning_model: detect qwen-qwq, qwq-*, and *-thinking variants
  so tuning params (temperature, top_p, etc.) get stripped before
  payload assembly (same pattern as o1/o3/grok-3-mini)

Tests added:
- providers::tests::qwen_prefix_routes_to_dashscope_not_anthropic
- openai_compat::tests::qwen_reasoning_variants_are_detected

89 api lib tests passing, 0 failing. cargo fmt --check: clean.

Closes the user-reported gap: 'use Qwen 3.6 Plus via Alibaba API
directly, not OpenRouter' without needing OPENAI_BASE_URL override
or unsetting ANTHROPIC_API_KEY.
2026-04-08 14:06:26 +09:00
YeonGyu-Kim
006f7d7ee6 fix(test): add env_lock to plugin lifecycle test — closes ROADMAP #24
build_runtime_runs_plugin_lifecycle_init_and_shutdown was the only test
that set/removed ANTHROPIC_API_KEY without holding the env_lock mutex.
Under parallel workspace execution, other tests racing on the same env
var could wipe the key mid-construction, causing a flaky credential error.

Root cause: process-wide env vars are shared mutable state. All other
tests that touch ANTHROPIC_API_KEY already use env_lock(). This test
was the only holdout.

Fix: add let _guard = env_lock(); at the top of the test.
2026-04-08 12:46:04 +09:00
YeonGyu-Kim
82baaf3f22 fix(ci): update integration test MessageRequest initializers for new tuning fields
openai_compat_integration.rs and client_integration.rs had MessageRequest
constructions without the new tuning param fields (temperature, top_p,
frequency_penalty, presence_penalty, stop) added in c667d47.

Added ..Default::default() to all 4 sites. cargo fmt applied.

This was the root cause of CI red on main (E0063 compile error in
integration tests, not caught by --lib tests).
2026-04-08 11:43:51 +09:00
YeonGyu-Kim
c7b3296ef6 style: cargo fmt — fix CI formatting failures
Pre-existing formatting issues in anthropic.rs surfaced by CI cargo fmt check.
No functional changes.
2026-04-08 11:21:13 +09:00
YeonGyu-Kim
000aed4188 fix(commands): fix brittle /session help assertion after delete subcommand addition
renders_help_from_shared_specs hardcoded the exact /session usage string,
which broke when /session delete was added in batch 5. Relaxed to check
for /session presence instead of exact subcommand list.

Pre-existing test brittleness (not caused by recent commits).

687 workspace lib tests passing, 0 failing.
2026-04-08 09:33:51 +09:00
YeonGyu-Kim
523ce7474a fix(api): sanitize Anthropic body — strip frequency/presence_penalty, convert stop→stop_sequences
MessageRequest now carries OpenAI-compatible tuning params (c667d47), but
the Anthropic API does not support frequency_penalty or presence_penalty,
and uses 'stop_sequences' instead of 'stop'. Without this fix, setting
these params with a Claude model would produce 400 errors.

Changes to strip_unsupported_beta_body_fields:
- Remove frequency_penalty and presence_penalty from Anthropic request body
- Convert stop → stop_sequences (only when non-empty)
- temperature and top_p are preserved (Anthropic supports both)

Tests added:
- strip_removes_openai_only_fields_and_converts_stop
- strip_does_not_add_empty_stop_sequences

87 api lib tests passing, 0 failing.
cargo check --workspace: clean.
2026-04-08 09:05:10 +09:00
YeonGyu-Kim
b513d6e462 fix(api): sanitize tuning params for reasoning models (o1/o3/grok-3-mini)
Reasoning models reject temperature, top_p, frequency_penalty, and
presence_penalty with 400 errors. Instead of letting these flow through
and returning cryptic provider errors, strip them silently at the
request-builder boundary.

is_reasoning_model() classifies: o1*, o3*, o4*, grok-3-mini.
stop sequences are preserved (safe for all providers).

Tests added:
- reasoning_model_strips_tuning_params: o1-mini strips all 4 params, keeps stop
- grok_3_mini_is_reasoning_model: classification coverage for grok-3-mini, o1,
  o3-mini, and negative cases (gpt-4o, grok-3, claude)

85 api lib tests passing, 0 failing.
2026-04-08 07:32:47 +09:00
YeonGyu-Kim
c667d47c70 feat(api): add tuning params (temperature, top_p, penalties, stop) to MessageRequest
MessageRequest was missing standard OpenAI-compatible generation tuning
parameters. Callers had no way to control temperature, top_p,
frequency_penalty, presence_penalty, or stop sequences.

Changes:
- Added 5 optional fields to MessageRequest (all Option, None by default)
- Wired into build_chat_completion_request: only included in payload when set
- All existing construction sites updated with ..Default::default()
- MessageRequest now derives Default for ergonomic partial construction

Tests added:
- tuning_params_included_in_payload_when_set: all 5 params flow into JSON
- tuning_params_omitted_from_payload_when_none: absent params stay absent

83 api lib tests passing, 0 failing.
cargo check --workspace: 0 warnings.
2026-04-08 07:07:33 +09:00
YeonGyu-Kim
0530c509a3 fix(api): route openai/ and gpt- model prefixes to OpenAi provider
metadata_for_model returned None for unknown models like openai/gpt-4.1-mini,
causing detect_provider_kind to fall through to auth-sniffer order. If
ANTHROPIC_API_KEY was set, the model was silently misrouted to Anthropic
and the user got a confusing 'missing Anthropic credentials' error.

Fix: add explicit prefix checks for 'openai/' and 'gpt-' in
metadata_for_model so the model name wins over env-var presence.

Regression test added: openai_namespaced_model_routes_to_openai_not_anthropic
- 'openai/gpt-4.1-mini' routes to OpenAi
- 'gpt-4o' routes to OpenAi

Reported and reproduced by gaebal-gajae against current main.
81 api lib tests passing, 0 failing.
2026-04-08 05:33:47 +09:00
YeonGyu-Kim
eff0765167 test(tools): fill WorkerGet and error-path coverage gaps
WorkerGet had zero test coverage. WorkerAwaitReady and WorkerSendPrompt
had only one happy-path test each with no error paths.

Added 4 tests:
- worker_get_returns_worker_state: WorkerGet fetches correct worker_id/status/cwd
- worker_get_on_unknown_id_returns_error: unknown id -> 'worker not found'
- worker_await_ready_on_spawning_worker_returns_not_ready: ready=false on spawning worker
- worker_send_prompt_on_non_ready_worker_returns_error: sending prompt before ready fails

94 tool tests passing, 0 failing.
2026-04-08 05:03:34 +09:00
YeonGyu-Kim
aee5263aef test(tools): prove recovery loop against .claw/worker-state.json directly
recovery_loop_state_file_reflects_transitions reads the actual state
file after each transition to verify the canonical observability surface
reflects the full stall->resolve->ready progression:

  spawning (state file exists, seconds_since_update present)
  -> trust_required (is_ready=false, trust_gate_cleared=false in file)
  -> spawning (trust_gate_cleared=true after WorkerResolveTrust)
  -> ready_for_prompt (is_ready=true after ready screen observe)

This is the end-to-end proof gaebal-gajae called for: clawhip polling
.claw/worker-state.json will see truthful state at every step of the
recovery loop, including the seconds_since_update staleness signal.

90 tool tests passing, 0 failing.
2026-04-08 04:38:38 +09:00
YeonGyu-Kim
9461522af5 feat(tools): expose WorkerObserveCompletion tool; add provider-degraded classification tests
observe_completion() on WorkerRegistry classifies finish_reason into
Finished vs Failed (finish='unknown' + 0 tokens = provider degraded).
This logic existed in the runtime but had no tool wrapper — clawhip
could not call it. Added WorkerObserveCompletion as a first-class tool.

Tool schema:
  { worker_id, finish_reason: string, tokens_output: integer }

Handler: run_worker_observe_completion -> global_worker_registry().observe_completion()

Tests added:
- worker_observe_completion_success_finish_sets_finished_status
  finish=end_turn + tokens=512 -> status=finished
- worker_observe_completion_degraded_provider_sets_failed_status
  finish=unknown + tokens=0 -> status=failed, last_error populated

89 tool tests passing, 0 failing.
2026-04-08 04:35:05 +09:00
YeonGyu-Kim
c08f060ca1 test(tools): end-to-end stall-detect and recovery loop coverage
Proves the clawhip restart/recover flow that gaebal-gajae flagged:

1. stall_detect_and_resolve_trust_end_to_end
   - Worker created without trusted_roots -> trust_auto_resolve=false
   - WorkerObserve with trust-prompt text -> status=trust_required, gate cleared=false
   - WorkerResolveTrust -> status=spawning, trust_gate_cleared=true
   - WorkerObserve with ready text -> status=ready_for_prompt
   Full resolve path verified end-to-end.

2. stall_detect_and_restart_recovery_end_to_end
   - Worker stalls at trust_required
   - WorkerRestart resets to spawning, trust_gate_cleared=false
   Documents the restart-then-re-acquire-trust flow.

Note: seconds_since_update is in .claw/worker-state.json (state file),
not in the Worker tool output struct. Staleness detection via state file
is covered by emit_state_file_writes_worker_status_on_transition in
worker_boot.rs tests.

87 tool tests passing, 0 failing.
2026-04-08 04:09:55 +09:00
YeonGyu-Kim
cae11413dd fix(dead-code): remove stale constants + dead function; add workspace_sessions_dir tests
Three dead-code warnings eliminated from cargo check:

1. KNOWN_TOP_LEVEL_KEYS / DEPRECATED_TOP_LEVEL_KEYS in config.rs
   - Superseded by config_validate::TOP_LEVEL_FIELDS and DEPRECATED_FIELDS
   - Were out of date (missing aliases, providerFallbacks, trustedRoots)
   - Removed

2. read_git_recent_commits in prompt.rs
   - Private function, never called anywhere in the codebase
   - Removed

3. workspace_sessions_dir in session.rs
   - Public API scaffolded for session isolation (#41)
   - Genuinely useful for external consumers (clawhip enumerating sessions)
   - Added 2 tests: deterministic path for same CWD, different path for different CWDs
   - Annotated with #[allow(dead_code)] since it is external-facing API

cargo check --workspace: 0 warnings remaining
430 runtime tests passing, 0 failing
2026-04-08 04:04:54 +09:00
YeonGyu-Kim
aa37dc6936 test(tools): add coverage for WorkerRestart and WorkerTerminate tools
WorkerRestart and WorkerTerminate had zero test coverage despite being
public tools in the tool spec. Also confirms one design decision worth
noting: restart resets trust_gate_cleared=false, so an allowlisted
worker that gets restarted must re-acquire trust via the normal observe
flow (by design — trust is per-session, not per-CWD).

Tests added:
- worker_terminate_sets_finished_status
- worker_restart_resets_to_spawning (verifies status=spawning,
  prompt_in_flight=false, trust_gate_cleared=false)
- worker_terminate_on_unknown_id_returns_error
- worker_restart_on_unknown_id_returns_error

85 tool tests passing, 0 failing.
2026-04-08 03:33:05 +09:00
YeonGyu-Kim
6ddfa78b7c feat(tools): wire config.trusted_roots into WorkerCreate tool
Previously WorkerCreate passed trusted_roots directly to spawn_worker
with no config-level default. Any batch script omitting the field
stalled all workers at TrustRequired with no recovery path.

Now run_worker_create loads RuntimeConfig from the worker CWD before
spawning and merges config.trusted_roots() with per-call overrides.
Per-call overrides still take effect; config provides the default.

Add test: worker_create_merges_config_trusted_roots_without_per_call_override
- writes .claw/settings.json with trustedRoots=[<os-temp-dir>] in a temp worktree
- calls WorkerCreate with no trusted_roots field
- asserts trust_auto_resolve=true (config roots matched the CWD)

81 tool tests passing, 0 failing.
2026-04-08 03:08:13 +09:00
YeonGyu-Kim
bcdc52d72c feat(config): add trustedRoots to RuntimeConfig
Closes the startup-friction gap filed in ROADMAP (dd97c49).

WorkerCreate required trusted_roots on every call with no config-level
default. Any batch script that omitted the field stalled all workers at
TrustRequired with no auto-recovery path.

Changes:
- RuntimeFeatureConfig: add trusted_roots: Vec<String> field
- ConfigLoader: wire parse_optional_trusted_roots() for 'trustedRoots' key
- RuntimeConfig / RuntimeFeatureConfig: expose trusted_roots() accessor
- config_validate: add trustedRoots to TOP_LEVEL_FIELDS schema (StringArray)
- Tests: parses_trusted_roots_from_settings + trusted_roots_default_is_empty_when_unset

Callers can now set trusted_roots in .claw/settings.json:
  { "trustedRoots": ["/tmp/worktrees"] }

WorkerRegistry::spawn_worker() callers should merge config.trusted_roots()
with any per-call overrides (wiring left for follow-up).
2026-04-08 02:35:19 +09:00
YeonGyu-Kim
5dfb1d7c2b fix(config_validate): add missing aliases/providerFallbacks to schema; fix deprecated-key bypass
Two real schema gaps found via dogfood (cargo test -p runtime):

1. aliases and providerFallbacks not in TOP_LEVEL_FIELDS
   - Both are valid config keys parsed by config.rs
   - Validator was rejecting them as unknown keys
   - 2 tests failing: parses_user_defined_model_aliases,
     parses_provider_fallbacks_chain

2. Deprecated keys were being flagged as unknown before the deprecated
   check ran (unknown-key check runs first in validate_object_keys)
   - Added early-exit for deprecated keys in unknown-key loop
   - Keeps deprecated→warning behavior for permissionMode/enabledPlugins
     which still appear in valid legacy configs

3. Config integration tests had assertions on format strings that never
   matched the actual validator output (path:3: vs path: ... (line N))
   - Updated assertions to check for path + line + field name as
     independent substrings instead of a format that was never produced

426 tests passing, 0 failing.
2026-04-08 01:45:08 +09:00
YeonGyu-Kim
fcb5d0c16a fix(worker_boot): add seconds_since_update to state snapshot
Clawhip needs to distinguish a stalled trust_required worker from one
that just transitioned. Without a pre-computed staleness field it has
to compute epoch delta itself from updated_at.

seconds_since_update = now - updated_at at snapshot write time.
Clawhip threshold: > 60s in trust_required = stalled; act.
2026-04-08 01:03:00 +09:00
YeonGyu-Kim
314f0c99fd feat(worker_boot): emit .claw/worker-state.json on every status transition
WorkerStatus is fully tracked in worker_boot.rs but was invisible to
external observers (clawhip, orchestrators) because opencode serve's
HTTP server is upstream and not ours to extend.

Solution: atomic file-based observability.

- emit_state_file() writes .claw/worker-state.json on every push_event()
  call (tmp write + rename for atomicity)
- Snapshot includes: worker_id, status, is_ready, trust_gate_cleared,
  prompt_in_flight, last_event, updated_at
- Add 'claw state' CLI subcommand to read and print the file
- Add regression test: emit_state_file_writes_worker_status_on_transition
  verifies spawning→ready_for_prompt transition is reflected on disk

This closes the /state dogfood gap without requiring any upstream
opencode changes. Clawhip can now distinguish a truly stalled worker
(status: trust_required or running with no recent updated_at) from a
quiet-but-progressing one.
2026-04-08 00:37:44 +09:00
YeonGyu-Kim
092d8b6e21 fix(tests): add missing test imports for session/prompt history features
Add missing imports to test module:
- PromptHistoryEntry, render_prompt_history_report, parse_history_count
- parse_export_args, render_session_markdown
- summarize_tool_payload_for_markdown, short_tool_id

Fixes test compilation errors introduced by new session and export
features from batch 5/6 work.
2026-04-07 16:20:33 +09:00
YeonGyu-Kim
b3ccd92d24 feat: b6-pdf-extract-v2 follow-up work — batch 6 2026-04-07 16:11:51 +09:00
YeonGyu-Kim
0f2f02af2d feat: b6-http-proxy-v2 follow-up work — batch 6 2026-04-07 16:11:51 +09:00
YeonGyu-Kim
e51566c745 feat: b6-bridge-directory follow-up work — batch 6 2026-04-07 16:11:50 +09:00
YeonGyu-Kim
20f3a5932a fix(cli): wire sessions_dir() through SessionStore::from_cwd() (#41)
The CLI was using a flat cwd/.claw/sessions/ path without workspace
fingerprinting, while SessionStore::from_cwd() adds a hash subdirectory.
This mismatch meant the isolation machinery existed but wasn't actually
used by the main session management codepath.

Now sessions_dir() delegates to SessionStore::from_cwd(), ensuring all
session operations use workspace-fingerprinted directories.
2026-04-07 16:03:44 +09:00
YeonGyu-Kim
28e6cc0965 feat(runtime): activate per-worktree session isolation (#41)
Remove #[cfg(test)] gate from session_control module — SessionStore
is now available at runtime, not just in tests. Export SessionStore and
add workspace_sessions_dir() helper that creates fingerprinted session
directories per workspace root.

This is the #41 kill shot: parallel opencode serve instances will use
separate session namespaces based on workspace fingerprint instead of
sharing a global ~/.local/share/opencode/ store.

The CLI already uses cwd/.claw/sessions/ (sessions_dir()), and now
SessionStore::from_cwd() adds workspace hash isolation on top.
2026-04-07 16:00:57 +09:00
YeonGyu-Kim
f03b8dce17 feat: bridge directory metadata + stale-base preflight check
- Add CWD to SSE session events (kills Directory: unknown)
- Add stale-base preflight: verify HEAD matches expected base commit
- Warn on divergence before session starts
2026-04-07 15:55:38 +09:00
YeonGyu-Kim
ecdca49552 feat: plugin-level max_output_tokens override via session_control 2026-04-07 15:55:38 +09:00
YeonGyu-Kim
5c276c8e14 feat: b6-pdf-extract-v2 — batch 6 2026-04-07 15:52:30 +09:00
YeonGyu-Kim
1f968b359f feat: b6-openai-models — batch 6 2026-04-07 15:52:30 +09:00
YeonGyu-Kim
18d3c1918b feat: b6-http-proxy-v2 — batch 6 2026-04-07 15:52:30 +09:00
YeonGyu-Kim
82f2e8e92b feat: doctor-cmd implementation 2026-04-07 15:28:43 +09:00
YeonGyu-Kim
8f4651a096 fix: resolve git_context field references after cherry-pick merge 2026-04-07 15:20:20 +09:00
YeonGyu-Kim
dab16c230a feat: b5-session-export — batch 5 wave 2 2026-04-07 15:19:45 +09:00
YeonGyu-Kim
a46711779c feat: b5-markdown-fence — batch 5 wave 2 2026-04-07 15:19:45 +09:00
YeonGyu-Kim
ef0b870890 feat: b5-git-aware — batch 5 wave 2 2026-04-07 15:19:45 +09:00
YeonGyu-Kim
4557a81d2f feat: b5-doctor-cmd — batch 5 wave 2 2026-04-07 15:19:45 +09:00
YeonGyu-Kim
86c3667836 feat: b5-context-compress — batch 5 wave 2 2026-04-07 15:19:45 +09:00
YeonGyu-Kim
260bac321f feat: b5-config-validate — batch 5 wave 2 2026-04-07 15:19:44 +09:00
YeonGyu-Kim
133ed4581e feat(config): add config file validation with clear error messages
Parse TOML/JSON config on startup, emit errors for unknown keys, wrong
types, deprecated fields with exact line and field name.
2026-04-07 15:10:08 +09:00
YeonGyu-Kim
8663751650 fix: resolve merge conflicts from batch 5 cherry-picks (compact field, run_turn_with_output arity) 2026-04-07 14:53:46 +09:00
YeonGyu-Kim
90f2461f75 feat: b5-tool-timeout — batch 5 upstream parity 2026-04-07 14:51:32 +09:00
YeonGyu-Kim
0d8fd51a6c feat: b5-stdin-pipe — batch 5 upstream parity 2026-04-07 14:51:28 +09:00
YeonGyu-Kim
5bcbc86a2b feat: b5-slash-help — batch 5 upstream parity 2026-04-07 14:51:27 +09:00
YeonGyu-Kim
d509f16b5a feat: b5-skip-perms-flag — batch 5 upstream parity 2026-04-07 14:51:27 +09:00
YeonGyu-Kim
d089d1a9cc feat: b5-retry-backoff — batch 5 upstream parity 2026-04-07 14:51:27 +09:00
YeonGyu-Kim
6a6c5acb02 feat: b5-reasoning-guard — batch 5 upstream parity 2026-04-07 14:51:27 +09:00
YeonGyu-Kim
9105e0c656 feat: b5-openrouter-fix — batch 5 upstream parity 2026-04-07 14:51:26 +09:00
YeonGyu-Kim
b8f76442e2 feat: b5-multi-provider — batch 5 upstream parity 2026-04-07 14:51:26 +09:00