Commit Graph

480 Commits

Author SHA1 Message Date
Jobdori
e79d8dafb5 docs(roadmap): add backlog item #12 — wire SummaryCompressor into lane event pipeline 2026-04-04 03:01:59 +09:00
Jobdori
804f3b6fac docs(roadmap): add backlog item #11 — wire lane-completion emitter 2026-04-04 02:32:00 +09:00
Jobdori
0f88a48c03 docs(roadmap): add backlog item #10 — swarm branch-lock dedup 2026-04-04 01:30:44 +09:00
Jobdori
e580311625 docs(roadmap): add backlog item #9 — render_diff_report test isolation 2026-04-04 01:04:52 +09:00
Jobdori
6d35399a12 fix: resolve merge conflicts in lib.rs re-exports 2026-04-04 00:48:26 +09:00
Jobdori
a1aba3c64a merge: ultraclaw/recovery-recipes into main 2026-04-04 00:45:14 +09:00
Jobdori
4ee76ee7f4 merge: ultraclaw/summary-compression into main 2026-04-04 00:45:13 +09:00
Jobdori
6d7c617679 merge: ultraclaw/session-control-api into main 2026-04-04 00:45:12 +09:00
Jobdori
5ad05c68a3 merge: ultraclaw/mcp-lifecycle-harden into main 2026-04-04 00:45:12 +09:00
Jobdori
eff9404d30 merge: ultraclaw/green-contract into main 2026-04-04 00:45:11 +09:00
Jobdori
d126a3dca4 merge: ultraclaw/trust-resolver into main 2026-04-04 00:45:10 +09:00
Jobdori
a91e855d22 merge: ultraclaw/plugin-lifecycle into main 2026-04-04 00:45:10 +09:00
Jobdori
db97aa3da3 merge: ultraclaw/policy-engine into main 2026-04-04 00:45:09 +09:00
Jobdori
ba08b0eb93 merge: ultraclaw/task-packet into main 2026-04-04 00:45:08 +09:00
Jobdori
d9644cd13a feat(runtime): trust prompt resolver 2026-04-04 00:44:08 +09:00
Jobdori
8321fd0c6b feat(runtime): actionable summary compression for lane event streams 2026-04-04 00:43:30 +09:00
Jobdori
c18f8a0da1 feat(runtime): structured session control API for claw-native worker management 2026-04-04 00:43:30 +09:00
Jobdori
c5aedc6e4e feat(runtime): stale branch detection 2026-04-04 00:42:55 +09:00
Jobdori
13015f6428 feat(runtime): hardened MCP lifecycle with phase tracking and degraded-mode reporting 2026-04-04 00:42:43 +09:00
Jobdori
f12cb76d6f feat(runtime): green-ness contract
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-04 00:42:41 +09:00
Jobdori
2787981632 feat(runtime): recovery recipes 2026-04-04 00:42:39 +09:00
Jobdori
b543760d03 feat(runtime): trust prompt resolver with allowlist and events 2026-04-04 00:42:28 +09:00
Jobdori
18340b561e feat(runtime): first-class plugin lifecycle contract with degraded-mode support 2026-04-04 00:41:51 +09:00
Jobdori
d74ecf7441 feat(runtime): policy engine for autonomous lane management 2026-04-04 00:40:50 +09:00
Jobdori
e1db949353 feat(runtime): typed task packet format for structured claw dispatch 2026-04-04 00:40:20 +09:00
Jobdori
02634d950e feat(runtime): stale-branch detection with freshness check and policy 2026-04-04 00:40:01 +09:00
Jobdori
f5e94f3c92 feat(runtime): plugin lifecycle 2026-04-04 00:38:35 +09:00
Yeachan-Heo
f76311f9d6 Prevent worker prompts from outrunning boot readiness
Add a foundational worker_boot control plane and tool surface for
reliable startup. The new registry tracks trust gates, ready-for-prompt
handshakes, prompt delivery attempts, and shell misdelivery recovery so
callers can coordinate worker boot above raw terminal transport.

Constraint: Current main has no tmux-backed worker control API to extend directly
Constraint: First slice must stay deterministic and fully testable in-process
Rejected: Wire the first implementation straight to tmux panes | would couple transport details to unfinished state semantics
Rejected: Ship parser helpers without control tools | would not enforce the ready-before-prompt contract end to end
Confidence: high
Scope-risk: moderate
Reversibility: clean
Directive: Treat WorkerObserve heuristics as a temporary transport adapter and replace them with typed runtime events before widening automation policy
Tested: cargo test -p runtime worker_boot
Tested: cargo test -p tools worker_tools
Tested: cargo check -p runtime -p tools
Not-tested: Real tmux/TTY trust prompts and live worker boot on an actual coding session
Not-tested: Full cargo clippy -p runtime -p tools --all-targets -- -D warnings (fails on pre-existing warnings outside this slice)
2026-04-03 15:20:22 +00:00
Yeachan-Heo
56ee33e057 Make agent lane state machine-readable
The background Agent tool already persisted lane-adjacent state via a JSON manifest and a markdown transcript, making it the smallest viable vertical slice for the ROADMAP lane-event work. This change adds canonical typed lane events to the manifest and normalizes terminal blockers into the shared failure taxonomy so downstream clawhip-style consumers can branch on structured state instead of scraping prose alone.

The slice is intentionally narrow: it covers agent start, finish, blocked, and failed transitions plus blocker classification, while leaving broader lane orchestration and external consumers for later phases. Tests lock the manifest schema and taxonomy mapping so future extensions can add events without regressing the typed baseline.

Constraint: Land a fresh-main vertical slice without inventing a larger lane framework first
Rejected: Add a brand-new lane subsystem across crates | too broad for one verified slice
Rejected: Only add markdown log annotations | still log-shaped and not machine-first
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Extend the same event names and failure classes before adding any alternate manifest schema for lane reporting
Tested: cargo test -p tools agent_persists_handoff_metadata -- --nocapture
Tested: cargo test -p tools agent_fake_runner_can_persist_completion_and_failure -- --nocapture
Tested: cargo test -p tools lane_failure_taxonomy_normalizes_common_blockers -- --nocapture
Not-tested: Full clawhip consumer integration or multi-crate event plumbing
2026-04-03 15:20:22 +00:00
Yeachan-Heo
07ae6e415f merge: mcp e2e lifecycle lane into main 2026-04-03 14:54:40 +00:00
Yeachan-Heo
bf5eb8785e Recover the MCP lane on top of current main
This resolves the stale-branch merge against origin/main, keeps the MCP runtime wiring, and preserves prompt-approved CLI tool execution after the mock parity harness additions landed upstream.

Constraint: Branch had to absorb origin/main changes through a contentful merge before more MCP work
Constraint: Prompt-approved runtime tool execution must continue working with new CLI/mock parity coverage
Rejected: Keep permission enforcer attached inside CliToolExecutor for conversation turns | caused prompt-approved bash parity flow to fail as a tool error
Rejected: Defer the merge and continue on stale history | would leave the lane red against current main
Confidence: high
Scope-risk: moderate
Reversibility: clean
Directive: Runtime permission policy and executor-side permission enforcement are separate layers; do not reapply executor enforcement to conversation turns without revalidating mock parity harness approval flows
Tested: cargo test -p rusty-claude-cli --test mock_parity_harness -- --nocapture; cargo test -p rusty-claude-cli -- --nocapture; cargo test --workspace -- --nocapture
Not-tested: Additional live remote/provider scenarios beyond the existing workspace suite
2026-04-03 14:51:18 +00:00
Yeachan-Heo
95aa5ef15c docs: add clawable harness roadmap 2026-04-03 14:48:08 +00:00
Yeachan-Heo
b3fe057559 Close the MCP lifecycle gap from config to runtime tool execution
This wires configured MCP servers into the CLI/runtime path so discovered
MCP tools, resource wrappers, search visibility, shutdown handling, and
best-effort discovery all work together instead of living as isolated
runtime primitives.

Constraint: Keep non-MCP startup flows working without new required config
Constraint: Preserve partial availability when one configured MCP server fails discovery
Rejected: Fail runtime startup on any MCP discovery error | too brittle for mixed healthy/broken server configs
Rejected: Keep MCP support runtime-only without registry wiring | left discovery and invocation unreachable from the CLI tool lane
Confidence: high
Scope-risk: moderate
Reversibility: clean
Directive: Runtime MCP tools are registry-backed but executed through CliToolExecutor state; keep future tool-registry changes aligned with that split
Tested: cargo test -p runtime mcp -- --nocapture; cargo test -p tools -- --nocapture; cargo test -p rusty-claude-cli -- --nocapture; cargo test --workspace -- --nocapture
Not-tested: Live remote MCP transports (http/sse/ws/sdk) remain unsupported in the CLI execution path
2026-04-03 14:31:25 +00:00
Jobdori
a2351fe867 feat(harness+usage): add auto_compact and token_cost parity scenarios
Two new mock parity harness scenarios:

1. auto_compact_triggered (session-compaction category)
   - Mock returns 50k input tokens, validates auto_compaction key
     is present in JSON output
   - Validates format parity; trigger behavior covered by
     conversation::tests::auto_compacts_when_cumulative_input_threshold_is_crossed

2. token_cost_reporting (token-usage category)
   - Mock returns known token counts (1k input, 500 output)
   - Validates input/output token fields present in JSON output

Additional changes:
- Add estimated_cost to JSON prompt output (format_usd + pricing_for_model)
- Add final_text_sse_with_usage and text_message_response_with_usage helpers
  to mock-anthropic-service for parameterized token counts
- Add ScenarioCase.extra_env and ScenarioCase.resume_session fields
- Update mock_parity_scenarios.json: 10 -> 12 scenarios
- Update harness request count assertion: 19 -> 21

cargo test --workspace: 558 passed, 0 failed
2026-04-03 22:41:42 +09:00
Jobdori
6325add99e fix(tests): add env_lock to permission-sensitive CLI arg tests
Tests relying on PermissionMode::DangerFullAccess as default were
flaky under --workspace runs because other tests set
RUSTY_CLAUDE_PERMISSION_MODE without cleanup. Added env_lock() and
explicit env var removal to 7 affected tests.

Fixes: workspace-level cargo test flake (1 random test fails per run)
2026-04-03 22:07:12 +09:00
Jobdori
a00e9d6ded Merge jobdori/final-slash-commands: reach 141 slash spec parity 2026-04-03 19:55:12 +09:00
Jobdori
bd9c145ea1 feat(commands): reach upstream slash command parity — 135 → 141 specs
Add 6 final slash commands:
- agent: manage sub-agents and spawned sessions
- subagent: control active subagent execution
- reasoning: toggle extended reasoning mode
- budget: show/set token budget limits
- rate-limit: configure API rate limiting
- metrics: show performance and usage metrics

Reach upstream parity target of 141 slash command specs.
2026-04-03 19:55:12 +09:00
Jobdori
742f2a12f9 Merge jobdori/slash-expansion: slash commands 67 → 135 specs 2026-04-03 19:52:44 +09:00
Jobdori
0490636031 feat(commands): expand slash command surface 67 → 135 specs
Add 68 new slash command specs covering:
- Approval flow: approve/deny
- Editing: undo, retry, paste, image, screenshot
- Code ops: test, lint, build, run, fix, refactor, explain, docs, perf
- Git: git, stash, blame, log
- LSP: symbols, references, definition, hover, diagnostics, autofix
- Navigation: focus/unfocus, web, map, search, workspace
- Model: max-tokens, temperature, system-prompt, tool-details
- Session: history, tokens, cache, pin/unpin, bookmarks, format
- Infra: cron, team, parallel, multi, macro, alias
- Config: api-key, language, profile, telemetry, env, project
- Other: providers, notifications, changelog, templates, benchmark, migrate, reset

Update tests: flexible assertions for expanded command surface
2026-04-03 19:52:40 +09:00
Jobdori
b5f4e4a446 Merge jobdori/parity-status-update: comprehensive PARITY.md update 2026-04-03 19:39:28 +09:00
Jobdori
d919616e99 docs(PARITY.md): comprehensive status update — all 9 lanes merged, stubs replaced
- All 9 lanes now merged on main
- Output truncation complete (16KB)
- AskUserQuestion + RemoteTrigger real implementations
- Updated still-open items with accurate remaining gaps
- Updated migration readiness checklist
2026-04-03 19:39:28 +09:00
Jobdori
ee31e00493 Merge jobdori/stub-implementations: AskUserQuestion + RemoteTrigger real implementations 2026-04-03 19:37:39 +09:00
Jobdori
80ad9f4195 feat(tools): replace AskUserQuestion + RemoteTrigger stubs with real implementations
- AskUserQuestion: interactive stdin/stdout prompt with numbered options
- RemoteTrigger: real HTTP client (GET/POST/PUT/DELETE/PATCH/HEAD)
  with custom headers, body, 30s timeout, response truncation
- All 480+ tests green
2026-04-03 19:37:34 +09:00
Jobdori
20d663cc31 Merge jobdori/parity-followup: bash validation module + output truncation 2026-04-03 19:35:11 +09:00
Jobdori
ba196a2300 docs(PARITY.md): update 9-lane status and follow-up items
- Mark bash validation as merged (1cfd78a)
- Mark output truncation as complete
- Update lane table with correct merge commits
- Add follow-up parity items section
2026-04-03 19:32:11 +09:00
Jobdori
1cfd78ac61 feat: bash validation module + output truncation parity
- Add bash_validation.rs with 9 submodules (1004 lines):
  readOnlyValidation, destructiveCommandWarning, modeValidation,
  sedValidation, pathValidation, commandSemantics, bashPermissions,
  bashSecurity, shouldUseSandbox
- Wire into runtime lib.rs
- Add MAX_OUTPUT_BYTES (16KB) truncation to bash.rs
- Add 4 truncation tests, all passing
- Full test suite: 270+ green
2026-04-03 19:31:49 +09:00
Jobdori
ddae15dede fix(enforcer): defer to caller prompt flow when active mode is Prompt
The PermissionEnforcer was hard-denying tool calls that needed user
approval because it passes no prompter to authorize(). When the
active permission mode is Prompt, the enforcer now returns Allowed
and defers to the CLI's interactive approval flow.

Fixes: mock_parity_harness bash_permission_prompt_approved scenario
2026-04-03 18:39:14 +09:00
Jobdori
8cc7d4c641 chore: additional AI slop cleanup and enforcer wiring from sessions 1/5
Session 1 (ses_2ad65873): with_enforcer builders + 2 regression tests
Session 5 (ses_2ad67e8e): continued AI slop cleanup pass — redundant
  comments, unused_self suppressions, unreachable! tightening
Session cleanup (ses_2ad6b26c): Python placeholder centralization

Workspace tests: 363+ passed, 0 failed.
2026-04-03 18:35:27 +09:00
Jobdori
618a79a9f4 feat: ultraclaw session outputs — registry tests, MCP bridge, PARITY.md, cleanup
Ultraclaw mode results from 10 parallel opencode sessions:

- PARITY.md: Updated both copies with all 9 landed lanes, commit hashes,
  line counts, and test counts. All checklist items marked complete.
- MCP bridge: McpToolRegistry.call_tool now wired to real McpServerManager
  via async JSON-RPC (discover_tools -> tools/call -> shutdown)
- Registry tests: Added coverage for TaskRegistry, TeamRegistry,
  CronRegistry, PermissionEnforcer, LspRegistry (branch-focused tests)
- Permissions refactor: Simplified authorize_with_context, extracted helpers,
  added characterization tests (185 runtime tests pass)
- AI slop cleanup: Removed redundant comments, unused_self suppressions,
  tightened unreachable branches
- CLI fixes: Minor adjustments in main.rs and hooks.rs

All 363+ tests pass. Workspace compiles clean.
2026-04-03 18:23:03 +09:00
Jobdori
f25363e45d fix(tools): wire PermissionEnforcer into execute_tool dispatch path
The review correctly identified that enforce_permission_check() was defined
but never called. This commit:

- Adds enforcer: Option<PermissionEnforcer> field to GlobalToolRegistry
  and SubagentToolExecutor
- Adds set_enforcer() method for runtime configuration
- Gates both execute() paths through enforce_permission_check() when
  an enforcer is configured
- Default: None (Allow-all, matching existing behavior)

Resolves the dead-code finding from ultraclaw review sessions 3 and 8.
2026-04-03 18:18:19 +09:00