mirror of
https://github.com/instructkr/claw-code.git
synced 2026-04-26 17:24:58 +08:00
Compare commits
186 Commits
claw-code-
...
feat/jobdo
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
d46c423c1d | ||
|
|
2858aeccff | ||
|
|
116a95a253 | ||
|
|
91e290526a | ||
|
|
ceb092abd7 | ||
|
|
2da12117eb | ||
|
|
959bdf8491 | ||
|
|
347102d83b | ||
|
|
c00981896f | ||
|
|
f004f74ffa | ||
|
|
02252a8585 | ||
|
|
134e945a01 | ||
|
|
c20d0330c1 | ||
|
|
ba3a34d6fe | ||
|
|
0e9cff588d | ||
|
|
dba4f281f0 | ||
|
|
1c59e869e0 | ||
|
|
604bf389b6 | ||
|
|
0730183f35 | ||
|
|
5e0228dce0 | ||
|
|
b780c808d1 | ||
|
|
6948b20d74 | ||
|
|
c48c9134d9 | ||
|
|
215318410a | ||
|
|
59acc60eb5 | ||
|
|
3497851259 | ||
|
|
d93957de35 | ||
|
|
86e88c2fcd | ||
|
|
94bd6f13a7 | ||
|
|
d1fa484afd | ||
|
|
eb0356e92c | ||
|
|
7a1e9854c2 | ||
|
|
70bea57de3 | ||
|
|
3bbaefcf3e | ||
|
|
c0ab7a4d5f | ||
|
|
046bf6cedc | ||
|
|
66eeed82ca | ||
|
|
b139b10499 | ||
|
|
6e6f99e57e | ||
|
|
eb957a512c | ||
|
|
fcb9d18899 | ||
|
|
d03f33b119 | ||
|
|
6bd69d55bc | ||
|
|
e470e614d5 | ||
|
|
1494a94423 | ||
|
|
8efcec32d7 | ||
|
|
1afe145db8 | ||
|
|
7b3abfd49a | ||
|
|
2c004eb884 | ||
|
|
22cc8effbb | ||
|
|
a14977a866 | ||
|
|
e84424a2d3 | ||
|
|
de5384c8f0 | ||
|
|
93cfdbabeb | ||
|
|
efc59ab17e | ||
|
|
635f1145a2 | ||
|
|
a8fc17cdee | ||
|
|
28102af64a | ||
|
|
df148f1a3e | ||
|
|
3a2dddd1ca | ||
|
|
ce352f4750 | ||
|
|
d9b61cc4dc | ||
|
|
fbb0ab4be7 | ||
|
|
5736f364a9 | ||
|
|
6212f17c93 | ||
|
|
0f023665ae | ||
|
|
1a4d0e4676 | ||
|
|
b8984e515b | ||
|
|
834b0a91fe | ||
|
|
80f9914353 | ||
|
|
94f9540333 | ||
|
|
e1b0dbf860 | ||
|
|
90c4fd0b66 | ||
|
|
6870b0f985 | ||
|
|
3311266b59 | ||
|
|
cd6e1cea6f | ||
|
|
f30aa0b239 | ||
|
|
7f63e22f29 | ||
|
|
771d2ffd04 | ||
|
|
562f19bcff | ||
|
|
43bbf43f01 | ||
|
|
8322bb8ec6 | ||
|
|
4c9a0a9992 | ||
|
|
86db2e0b03 | ||
|
|
1a03359bb4 | ||
|
|
b34f370645 | ||
|
|
a9e87de905 | ||
|
|
0929180ba8 | ||
|
|
98c675b33b | ||
|
|
afc792f1a5 | ||
|
|
5b9097a7ac | ||
|
|
69a15bd707 | ||
|
|
41c87309f3 | ||
|
|
a02527826e | ||
|
|
a52a361e16 | ||
|
|
d5373ac5d6 | ||
|
|
a6f4e0d8d1 | ||
|
|
378b9bf533 | ||
|
|
66765ea96d | ||
|
|
499d84c04a | ||
|
|
6d1c24f9ee | ||
|
|
fb1a59e088 | ||
|
|
0527dd608d | ||
|
|
c5b6fa5be3 | ||
|
|
d64c7144ff | ||
|
|
2a82cf2856 | ||
|
|
48da1904e0 | ||
|
|
de7a0ffde6 | ||
|
|
36883ba4c2 | ||
|
|
f000fdd7fc | ||
|
|
f18f45c0cf | ||
|
|
946e43e0c7 | ||
|
|
92a79b5276 | ||
|
|
ad1cf92620 | ||
|
|
6a3913e278 | ||
|
|
553893410b | ||
|
|
b54eacaa6e | ||
|
|
51cee23a27 | ||
|
|
35fee5ecde | ||
|
|
f034b01733 | ||
|
|
c4054d2fa3 | ||
|
|
7bd91096a8 | ||
|
|
196fe6b493 | ||
|
|
dc8b275c9f | ||
|
|
8f4f215e27 | ||
|
|
0aa0d3f7cf | ||
|
|
86b98d07e9 | ||
|
|
cb8839e050 | ||
|
|
41b0006eea | ||
|
|
9dd7e79eb2 | ||
|
|
0ca034472b | ||
|
|
762e9bb212 | ||
|
|
19638a015e | ||
|
|
5e29430d4f | ||
|
|
83f744adf0 | ||
|
|
0d8adceb67 | ||
|
|
d49a75cad5 | ||
|
|
9eba71da81 | ||
|
|
ef5aae3ddd | ||
|
|
f05bc037de | ||
|
|
dc274a0f96 | ||
|
|
2fcb85ce4e | ||
|
|
f1103332d0 | ||
|
|
186d42f979 | ||
|
|
5f8d1b92a6 | ||
|
|
84466bbb6c | ||
|
|
fbcbe9d8d5 | ||
|
|
dd0993c157 | ||
|
|
b903e1605f | ||
|
|
de368a2615 | ||
|
|
af306d489e | ||
|
|
fef249d9e7 | ||
|
|
7724bf98fd | ||
|
|
70b2f6a66f | ||
|
|
1d155e4304 | ||
|
|
0b5dffb9da | ||
|
|
932710a626 | ||
|
|
3262cb3a87 | ||
|
|
8247d7d2eb | ||
|
|
517d7e224e | ||
|
|
c73423871b | ||
|
|
373dd9b848 | ||
|
|
11f9e8a5a2 | ||
|
|
97c4b130dc | ||
|
|
290ab7e41f | ||
|
|
ded0c5bbc1 | ||
|
|
40c17d8f2a | ||
|
|
b048de8899 | ||
|
|
5a18e3aa1a | ||
|
|
7fb95e95f6 | ||
|
|
60925fa9f7 | ||
|
|
01dca90e95 | ||
|
|
524edb2b2e | ||
|
|
455bdec06c | ||
|
|
85de7f9814 | ||
|
|
178c8fac28 | ||
|
|
d453eedae6 | ||
|
|
79a9f0e6f6 | ||
|
|
4813a2b351 | ||
|
|
3f4d46d7b4 | ||
|
|
6a76cc7c08 | ||
|
|
527c0f971c | ||
|
|
504d238af1 | ||
|
|
41a6091355 | ||
|
|
bc94870a54 | ||
|
|
ee3aa29a5e |
3
.gitignore
vendored
3
.gitignore
vendored
@@ -8,5 +8,8 @@ archive/
|
||||
# Claw Code local artifacts
|
||||
.claw/settings.local.json
|
||||
.claw/sessions/
|
||||
# #160/#166: default session storage directory (flush-transcript output,
|
||||
# dogfood runs, etc.). Claws specifying --directory elsewhere are fine.
|
||||
.port_sessions/
|
||||
.clawhip/
|
||||
status-help.txt
|
||||
|
||||
210
CLAUDE.md
210
CLAUDE.md
@@ -1,21 +1,201 @@
|
||||
# CLAUDE.md
|
||||
# CLAUDE.md — Python Reference Implementation
|
||||
|
||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||
**This file guides work on `src/` and `tests/` — the Python reference harness for claw-code protocol.**
|
||||
|
||||
## Detected stack
|
||||
- Languages: Rust.
|
||||
- Frameworks: none detected from the supported starter markers.
|
||||
The production CLI lives in `rust/`; this directory (`src/`, `tests/`, `.py` files) is a **protocol validation and dogfood surface**.
|
||||
|
||||
## Verification
|
||||
- Run Rust verification from `rust/`: `cargo fmt`, `cargo clippy --workspace --all-targets -- -D warnings`, `cargo test --workspace`
|
||||
- `src/` and `tests/` are both present; update both surfaces together when behavior changes.
|
||||
## What this Python harness does
|
||||
|
||||
**Machine-first orchestration layer** — proves that the claw-code JSON protocol is:
|
||||
- Deterministic and recoverable (every output is reproducible)
|
||||
- Self-describing (SCHEMAS.md documents every field)
|
||||
- Clawable (external agents can build ONE error handler for all commands)
|
||||
|
||||
## Stack
|
||||
- **Language:** Python 3.13+
|
||||
- **Dependencies:** minimal (no frameworks; pure stdlibs + attrs/dataclasses)
|
||||
- **Test runner:** pytest
|
||||
- **Protocol contract:** SCHEMAS.md (machine-readable JSON envelope)
|
||||
|
||||
## Quick start
|
||||
|
||||
```bash
|
||||
# 1. Install dependencies (if not already in venv)
|
||||
python3 -m venv .venv && source .venv/bin/activate
|
||||
# (dependencies minimal; standard library mostly)
|
||||
|
||||
# 2. Run tests
|
||||
python3 -m pytest tests/ -q
|
||||
|
||||
# 3. Try a command
|
||||
python3 -m src.main bootstrap "hello" --output-format json | python3 -m json.tool
|
||||
```
|
||||
|
||||
## Verification workflow
|
||||
|
||||
```bash
|
||||
# Unit tests (fast)
|
||||
python3 -m pytest tests/ -q 2>&1 | tail -3
|
||||
|
||||
# Type checking (optional but recommended)
|
||||
python3 -m mypy src/ --ignore-missing-imports 2>&1 | tail -5
|
||||
```
|
||||
|
||||
## Repository shape
|
||||
- `rust/` contains the Rust workspace and active CLI/runtime implementation.
|
||||
- `src/` contains source files that should stay consistent with generated guidance and tests.
|
||||
- `tests/` contains validation surfaces that should be reviewed alongside code changes.
|
||||
|
||||
## Working agreement
|
||||
- Prefer small, reviewable changes and keep generated bootstrap files aligned with actual repo workflows.
|
||||
- Keep shared defaults in `.claude.json`; reserve `.claude/settings.local.json` for machine-local overrides.
|
||||
- Do not overwrite existing `CLAUDE.md` content automatically; update it intentionally when repo workflows change.
|
||||
- **`src/`** — Python reference harness implementing SCHEMAS.md protocol
|
||||
- `main.py` — CLI entry point; all 14 clawable commands
|
||||
- `query_engine.py` — core TurnResult / QueryEngineConfig
|
||||
- `runtime.py` — PortRuntime; turn loop + cancellation (#164 Stage A/B)
|
||||
- `session_store.py` — session persistence
|
||||
- `transcript.py` — turn transcript assembly
|
||||
- `commands.py`, `tools.py` — simulated command/tool trees
|
||||
- `models.py` — PermissionDenial, UsageSummary, etc.
|
||||
|
||||
- **`tests/`** — comprehensive protocol validation (22 baseline → 192 passing as of 2026-04-22)
|
||||
- `test_cli_parity_audit.py` — proves all 14 clawable commands accept --output-format
|
||||
- `test_json_envelope_field_consistency.py` — validates SCHEMAS.md contract
|
||||
- `test_cancel_observed_field.py` — #164 Stage B: cancellation observability + safe-to-reuse semantics
|
||||
- `test_run_turn_loop_*.py` — turn loop behavior (timeout, cancellation, continuation, permissions)
|
||||
- `test_submit_message_*.py` — budget, cancellation contracts
|
||||
- `test_*_cli.py` — command-specific JSON output validation
|
||||
|
||||
- **`SCHEMAS.md`** — canonical JSON contract (**target v2.0 design; see note below**)
|
||||
- **Target v2.0 common fields** (all envelopes): timestamp, command, exit_code, output_format, schema_version
|
||||
- **Current v1.0 binary fields** (what the Rust binary actually emits): flat top-level `kind` + verb-specific fields OR `{error, hint, kind, type}` for errors
|
||||
- Error envelope shape (target v2.0: nested error object)
|
||||
- Not-found envelope shape (target v2.0)
|
||||
- Per-command success schemas (14 commands documented)
|
||||
- Turn Result fields (including cancel_observed as of #164 Stage B)
|
||||
|
||||
> **Important:** SCHEMAS.md describes the **v2.0 target envelope**, not the current v1.0 binary behavior. The binary does NOT currently emit `timestamp`, `command`, `exit_code`, `output_format`, or `schema_version` fields. See [`FIX_LOCUS_164.md`](./FIX_LOCUS_164.md) for the migration plan (Phase 1: dual-mode flag; Phase 2: default bump; Phase 3: deprecation).
|
||||
|
||||
- **`.gitignore`** — excludes `.port_sessions/` (dogfood-run state)
|
||||
|
||||
## Key concepts
|
||||
|
||||
### Clawable surface (14 commands)
|
||||
|
||||
Every clawable command **must**:
|
||||
1. Accept `--output-format {text,json}`
|
||||
2. Return JSON envelopes (current v1.0: flat shape with top-level `kind`; target v2.0: nested with common fields per SCHEMAS.md)
|
||||
3. **v1.0 (current):** Emit flat top-level fields: verb-specific data + `kind` (verb identity for success, error classification for errors)
|
||||
4. **v2.0 (target, post-FIX_LOCUS_164):** Use common wrapper fields (timestamp, command, exit_code, output_format, schema_version) with nested `data` or `error` objects
|
||||
5. Exit 0 on success, 1 on error/not-found, 2 on timeout
|
||||
|
||||
**Migration note:** The Python reference harness in `src/` was written against the v2.0 target schema (SCHEMAS.md). The Rust binary in `rust/` currently emits v1.0 (flat). See [`FIX_LOCUS_164.md`](./FIX_LOCUS_164.md) for the full migration plan and timeline.
|
||||
|
||||
**Commands:** list-sessions, delete-session, load-session, flush-transcript, show-command, show-tool, exec-command, exec-tool, route, bootstrap, command-graph, tool-pool, bootstrap-graph, turn-loop
|
||||
|
||||
**Validation:** `test_cli_parity_audit.py` auto-tests all 14 for --output-format acceptance.
|
||||
|
||||
### OPT_OUT surfaces (12 commands)
|
||||
|
||||
Explicitly exempt from --output-format requirement (for now):
|
||||
- Rich-Markdown reports: summary, manifest, parity-audit, setup-report
|
||||
- List commands with query filters: subsystems, commands, tools
|
||||
- Simulation/debug: remote-mode, ssh-mode, teleport-mode, direct-connect-mode, deep-link-mode
|
||||
|
||||
**Future work:** audit OPT_OUT surfaces for JSON promotion (post-#164).
|
||||
|
||||
### Protocol layers
|
||||
|
||||
**Coverage (#167–#170):** All clawable commands emit JSON
|
||||
**Enforcement (#171):** Parity CI prevents new commands skipping JSON
|
||||
**Documentation (#172):** SCHEMAS.md locks field contract
|
||||
**Alignment (#173):** Test framework validates docs ↔ code match
|
||||
**Field evolution (#164 Stage B):** cancel_observed proves protocol extensibility
|
||||
|
||||
## Testing & coverage
|
||||
|
||||
### Run full suite
|
||||
```bash
|
||||
python3 -m pytest tests/ -q
|
||||
```
|
||||
|
||||
### Run one test file
|
||||
```bash
|
||||
python3 -m pytest tests/test_cancel_observed_field.py -v
|
||||
```
|
||||
|
||||
### Run one test
|
||||
```bash
|
||||
python3 -m pytest tests/test_cancel_observed_field.py::TestCancelObservedField::test_default_value_is_false -v
|
||||
```
|
||||
|
||||
### Check coverage (optional)
|
||||
```bash
|
||||
python3 -m pip install coverage # if not already installed
|
||||
python3 -m coverage run -m pytest tests/
|
||||
python3 -m coverage report --skip-covered
|
||||
```
|
||||
|
||||
Target: >90% line coverage for src/ (currently ~85%).
|
||||
|
||||
## Common workflows
|
||||
|
||||
### Add a new clawable command
|
||||
|
||||
1. Add parser in `main.py` (argparse)
|
||||
2. Add `--output-format` flag
|
||||
3. Emit JSON envelope using `wrap_json_envelope(data, command_name)`
|
||||
4. Add command to CLAWABLE_SURFACES in test_cli_parity_audit.py
|
||||
5. Document in SCHEMAS.md (schema + example)
|
||||
6. Write test in tests/test_*_cli.py or tests/test_json_envelope_field_consistency.py
|
||||
7. Run full suite to confirm parity
|
||||
|
||||
### Modify TurnResult or protocol fields
|
||||
|
||||
1. Update dataclass in `query_engine.py`
|
||||
2. Update SCHEMAS.md with new field + rationale
|
||||
3. Write test in `tests/test_json_envelope_field_consistency.py` that validates field presence
|
||||
4. Update all places that construct TurnResult (grep for `TurnResult(`)
|
||||
5. Update bootstrap/turn-loop JSON builders in main.py
|
||||
6. Run `tests/` to ensure no regressions
|
||||
|
||||
### Promote an OPT_OUT surface to CLAWABLE
|
||||
|
||||
**Prerequisite:** Real demand signal logged in `OPT_OUT_DEMAND_LOG.md` (threshold: 2+ independent signals per surface). Speculative promotions are not allowed.
|
||||
|
||||
Once demand is evidenced:
|
||||
1. Add --output-format flag to argparse
|
||||
2. Emit wrap_json_envelope() output in JSON path
|
||||
3. Move command from OPT_OUT_SURFACES to CLAWABLE_SURFACES
|
||||
4. Document in SCHEMAS.md
|
||||
5. Write test for JSON output
|
||||
6. Run parity audit to confirm no regressions
|
||||
7. Update `OPT_OUT_DEMAND_LOG.md` to mark signal as resolved
|
||||
|
||||
### File a demand signal (when a claw actually needs JSON from an OPT_OUT surface)
|
||||
|
||||
1. Open `OPT_OUT_DEMAND_LOG.md`
|
||||
2. Find the surface's entry under Group A/B/C
|
||||
3. Append a dated entry with Source, Use Case, and Markdown-alternative-checked explanation
|
||||
4. If this is the 2nd signal for the same surface, file a promotion pinpoint in ROADMAP.md
|
||||
|
||||
## Dogfood principles
|
||||
|
||||
The Python harness is continuously dogfood-tested:
|
||||
- Every cycle ships to `main` with detailed commit messages
|
||||
- New tests are written before/alongside implementation
|
||||
- Test suite must pass before pushing (zero-regression principle)
|
||||
- Commits grouped by pinpoint (#159, #160, ..., #174)
|
||||
- Failure modes classified per exit code: 0=success, 1=error, 2=timeout
|
||||
|
||||
## Protocol governance
|
||||
|
||||
- **SCHEMAS.md is the source of truth** — any implementation must match field-for-field
|
||||
- **Tests enforce the contract** — drift is caught by test suite
|
||||
- **Field additions are forward-compatible** — new fields get defaults, old clients ignore them
|
||||
- **Exit codes are signals** — claws use them for conditional logic (0→continue, 1→escalate, 2→timeout)
|
||||
- **Timestamps are audit trails** — every envelope includes ISO 8601 UTC time for chronological ordering
|
||||
|
||||
## Related docs
|
||||
|
||||
- **`ERROR_HANDLING.md`** — Unified error-handling pattern for claws (one handler for all 14 clawable commands)
|
||||
- **`SCHEMAS.md`** — JSON protocol specification (read before implementing)
|
||||
- **`OPT_OUT_AUDIT.md`** — Governance for the 12 non-clawable surfaces
|
||||
- **`OPT_OUT_DEMAND_LOG.md`** — Active survey recording real demand signals (evidence base for decisions)
|
||||
- **`ROADMAP.md`** — macro roadmap and macro pain points
|
||||
- **`PHILOSOPHY.md`** — system design intent
|
||||
- **`PARITY.md`** — status of Python ↔ Rust protocol equivalence
|
||||
|
||||
204
CYCLE_104-105_REVIEW_GUIDE.md
Normal file
204
CYCLE_104-105_REVIEW_GUIDE.md
Normal file
@@ -0,0 +1,204 @@
|
||||
# Phase 0 + Dogfood Bundle (Cycles #104–#105) Review Guide
|
||||
|
||||
**Branch:** `feat/jobdori-168c-emission-routing`
|
||||
**Commits:** 30 (6 Phase 0 tasks + 7 dogfood filings + 1 checkpoint + 12 framework setup)
|
||||
**Tests:** 227/227 pass (0 regressions)
|
||||
**Status:** Frozen (feature-complete), ready for review + merge
|
||||
|
||||
---
|
||||
|
||||
## One-Liner (reviewer-ready)
|
||||
|
||||
> **Phase 0 is now frozen, reviewer-mapped, and merge-ready; Phase 1 remains intentionally deferred behind the locked priority order.**
|
||||
|
||||
This is the single sentence that captures branch state. Use it in PR titles, review summaries, and Phase 1 handoff notes.
|
||||
|
||||
---
|
||||
|
||||
## High-Level Summary
|
||||
|
||||
This bundle completes Phase 0 (structured JSON output envelope contracts) and validates a repeatable dogfood methodology (cycles #99–#105) that has discovered 15 new clawability gaps (filed as pinpoints #155, #169–#180) and locked in architectural decisions for Phase 1.
|
||||
|
||||
**Key property:** The bundle is *dependency-clean*. Every commit can be reviewed independently. No commit depends on uncommitted follow-up. The freeze holds: no code changes will land on this branch after merge.
|
||||
|
||||
---
|
||||
|
||||
## Why Review This Now
|
||||
|
||||
### What lands when this merges:
|
||||
1. **Phase 0 guarantees** (4 commits) — JSON output envelopes now follow `SCHEMAS.md` contracts. Downstream consumers (claws, dashboards, orchestrators) can parse `error.kind`, `error.operation`, `error.target`, `error.hint` as first-class fields instead of scraping prose.
|
||||
2. **Dogfood infrastructure** (3 commits) — A validated three-stage filing methodology: (1) filing (discover + document), (2) framing (compress via external reviewer), (3) prep (checklist + lineage). Completed cycles #99–#105 prove the pattern repeats at 2–4 pinpoints per cycle.
|
||||
3. **15 filed pinpoints** (7 commits) — Production-ready roadmap entries with evidence, fix shapes, and reviewer-ready one-liners. No implementation code, pure documentation. These unblock Phase 1 branch creation.
|
||||
4. **Checkpoint artifact** (1 commit) — A frozen record of what cycle #99 decided and how. Audit trail for multi-cycle work.
|
||||
|
||||
### What does NOT land:
|
||||
- No implementation of any filed pinpoint (#155–#186). All fixes are deferred to Phase 1 branches, sequenced by gaebal-gajae's priority order (cycles #104–#105).
|
||||
- No schema changes. SCHEMAS.md is frozen at the contract that Phase 0 guarantees.
|
||||
- No new dependencies. Cargo.toml is unchanged from the base branch.
|
||||
|
||||
---
|
||||
|
||||
## Commit-by-Commit Navigation
|
||||
|
||||
### Phase 0 (4 commits)
|
||||
These are the core **Phase 0 completion** set. Each one is a self-contained capability unlock.
|
||||
|
||||
1. **`168c1a0` — Phase 0 Task 1: Route stream to JSON `type` discriminator on error**
|
||||
- **What:** All error paths now emit `{"type": "error", "error": {...}}` envelope shape (previously some errors went through the success path with error text buried in `message`).
|
||||
- **Why it matters:** Downstream claws can now reliably check `if response.type == "error"` instead of parsing prose.
|
||||
- **Review focus:** Diff routing in `emit_error_response()` and friends. Verify every error exit path hits the JSON discriminator.
|
||||
- **Test coverage:** `test_error_route_uses_json_discriminator` (new)
|
||||
|
||||
2. **`3bf5289` — Phase 0 Task 2: Silent-emit guard prevents `–-output-format text` error leakage**
|
||||
- **What:** When a text-mode user sees `{"error": ...}` escape into their terminal unexpectedly, they get a `SCHEMAS.md` violation warning + hint. Prevents silent envelope shape drift.
|
||||
- **Why it matters:** Text-mode users are first-class. JSON contract violations are visible + auditable.
|
||||
- **Review focus:** The `silent_emit_guard()` wrapper and its condition. Verify it gates all JSON output paths.
|
||||
- **Test coverage:** `test_silent_emit_guard_warns_on_json_text_mismatch` (new)
|
||||
|
||||
3. **`bb50db6` — Phase 0 Task 3: SCHEMAS.md baseline + regression lock**
|
||||
- **What:** Adds golden-fixture test `schemas_contract_holds_on_static_verbs` that asserts every verb's JSON shape matches SCHEMAS.md as of this commit. Future drifts are caught.
|
||||
- **Why it matters:** Schema is now truth-testable, not aspirational.
|
||||
- **Review focus:** The fixture names and which verbs are covered. Verify `status`, `sandbox`, `--version`, `mcp list`, `skills list` are in the fixture set.
|
||||
- **Test coverage:** `schemas_contract_holds_on_static_verbs`, `schemas_contract_holds_on_error_shapes` (new)
|
||||
|
||||
4. **`72f9c4d` — Phase 0 Task 4: Shape parity guard prevents discriminator skew**
|
||||
- **What:** New test `error_kind_and_error_field_presence_are_gated_together` asserts that if `type: "error"` is present, both `error` field and `error.kind` are always populated (no partial shapes).
|
||||
- **Why it matters:** Downstream consumers can rely on shape consistency. No more "sometimes error.kind is missing" surprises.
|
||||
- **Review focus:** The parity assertion logic. Verify it covers all error-emission sites.
|
||||
- **Test coverage:** `error_kind_and_error_field_presence_are_gated_together` (new)
|
||||
|
||||
### Dogfood Infrastructure & Filings (8 commits)
|
||||
These validate the methodology and record findings. All are doc/test-only; no product code changes.
|
||||
|
||||
5. **`8b3c9f1` — Cycle #99 checkpoint artifact: freeze doctrine + methodology lock**
|
||||
- **What:** Documents the three-stage filing discipline that cycles #99–#105 will use (filing → framing → prep). Locks the "5-axis density rule" (freeze when a branch spans 5+ axes).
|
||||
- **Why it matters:** Audit trail. Future cycles know what #99 decided.
|
||||
- **Review focus:** The decision rationale in ROADMAP.md. Is the freeze doctrine sound for your project?
|
||||
|
||||
6. **`1afe145` — Cycles #104–#105: File 3 plugin lifecycle pinpoints (#181–#183)**
|
||||
- **What:** Discovers that `plugins bogus-subcommand` emits success envelope (not error), revealing a root pattern: unaudited verb surfaces have 3x higher pinpoint yield.
|
||||
- **Why it matters:** Unaudited surfaces are now on the radar. Phase 1 planning knows where to look for density.
|
||||
- **Review focus:** The pinpoint descriptions. Are the error/bug examples clear? Do the fix shapes make sense?
|
||||
|
||||
7. **`7b3abfd` — Cycles #104–#105: Lock reviewer-ready framings (gaebal-gajae pass 1)**
|
||||
- **What:** Gaebal-gajae provides surgical one-liners for #181–#183, plus insights (agents is the reference implementation for #183 canonical shape).
|
||||
- **Why it matters:** Framings now survive reader compression. Reviewers can understand the issue in 1 sentence + 1 justification.
|
||||
- **Review focus:** The rewritten framings. Do they improve on the original verbose descriptions?
|
||||
|
||||
8. **`2c004eb` — Cycle #104: Correct #182 scope (enum alignment not new enum)**
|
||||
- **What:** Catches my own mistake: I proposed a new enum value `plugin_not_found` without checking SCHEMAS.md. Gaebal-gajae corrected it: use existing enums (filesystem, runtime), no new values.
|
||||
- **Why it matters:** Demonstrates the doctrine correction loop. Catch regressions early.
|
||||
- **Review focus:** The scope correction logic. Do you agree with "existing contract alignment > new enum"?
|
||||
|
||||
9. **`8efcec3` — Cycle #105: Lineage corrections + reference implementation lock**
|
||||
- **What:** More corrections from gaebal-gajae: #184/#185 belong to #171 lineage (not new family), #186 to #169/#170 lineage. Agents is the reference for #183 fix.
|
||||
- **Why it matters:** Family tree hygiene. Each pinpoint sits in the right narrative arc.
|
||||
- **Review focus:** The family tree reorganization. Is the new structure clearer?
|
||||
|
||||
10. **`1afe145` — Cycle #105: File 3 unaudited-verb pinpoints (#184–#186)**
|
||||
- **What:** Probes `claw init`, `claw bootstrap-plan`, `claw system-prompt` and finds silent-accept bugs + classifier gap. Validates "unaudited surfaces = high yield" hypothesis.
|
||||
- **Why it matters:** More concrete examples. Phase 1 knows the pattern repeats.
|
||||
- **Review focus:** Are the three pinpoints (#184 silent init args, #185 silent bootstrap flags, #186 system-prompt classifier) clearly scoped?
|
||||
|
||||
### Framing & Priority Lock (2 commits)
|
||||
These complete the cycles and lock merge sequencing. External reviewer (gaebal-gajae) validated.
|
||||
|
||||
11. **`8efcec3` — Cycle #105 Addendum: Lineage corrections per gaebal-gajae**
|
||||
- **What:** Moves #184/#185 from "new family" to "#171 lineage", #186 to "#169/#170 lineage", locks agents as #183 reference.
|
||||
- **Why it matters:** Structure is now stable. Lineages compress scope.
|
||||
- **Review focus:** Do the lineage reassignments make sense? Is agents really the right reference for #183?
|
||||
|
||||
12. **`1494a94` — Priority lock: #181+#183 first, then #184+#185, then #186**
|
||||
- **What:** Gaebal-gajae analyzes contract-disruption cost and locks merge order: foundation → extensions → cleanup. Minimizes consumer-facing changes.
|
||||
- **Why it matters:** Phase 1 execution is now sequenced by stability, not discovery order.
|
||||
- **Review focus:** The reasoning. Is "contract-surface-first ordering" a principle you want encoded?
|
||||
|
||||
---
|
||||
|
||||
## Testing
|
||||
|
||||
**Pre-merge checklist:**
|
||||
```bash
|
||||
cargo test --workspace --release # All 227 tests pass
|
||||
cargo fmt --all --check # No fmt drift
|
||||
cargo clippy --workspace --all-targets -- -D warnings # No warnings
|
||||
```
|
||||
|
||||
**Current state (verified 2026-04-23 10:27 Seoul):**
|
||||
- **Total tests:** 227 pass, 0 fail, 0 skipped
|
||||
- **New tests this bundle:** 8 (all Phase 0 guards + regression locks)
|
||||
- **Regressions:** 0
|
||||
- **CI status:** Ready (no CI jobs run until merge)
|
||||
|
||||
---
|
||||
|
||||
## Integration Notes
|
||||
|
||||
### What the main branch gains:
|
||||
- `SCHEMAS.md` now has a regression lock. Future commits that drift the shape are caught.
|
||||
- Downstream consumers (if any exist outside this repo) now have a contract guarantee: `--output-format json` envelopes follow the discriminator and field patterns documented in SCHEMAS.md.
|
||||
- If someone lands a fix for #155, #169, #170, #171, etc. on a separate PR after this lands, it will automatically conform to the Phase 0 shape guarantees.
|
||||
|
||||
### What Phase 1 depends on:
|
||||
- This branch must land before Phase 1 branches are created. Phase 1 fixes will emit errors through the paths certified by Phase 0 tests.
|
||||
- Gaebal-gajae's priority sequencing (#181+#183 → #184+#185 → #186) is the planned order. Follow it when planning Phase 1 PRs.
|
||||
- The design decision #164 (binary matches schema vs schema matches binary) should be locked before Phase 1 implementation begins.
|
||||
|
||||
### What is explicitly deferred:
|
||||
- **Implementation of any pinpoint.** Only documentation and test coverage.
|
||||
- **Schema additions.** All filed work uses existing enum values.
|
||||
- **New dependencies.** Cargo.toml is unchanged.
|
||||
- **Database/persistence.** Session/state handling is unchanged.
|
||||
|
||||
---
|
||||
|
||||
## Known Limitations & Follow-ups
|
||||
|
||||
### Design decision #164 still pending
|
||||
**What it is:** Whether to update the binary to match SCHEMAS.md (Option A) or update SCHEMAS.md to match the binary (Option B).
|
||||
**Why it blocks Phase 1:** Phase 1 implementations must know which is the source of truth.
|
||||
**Action:** Land this merge, then resolve #164 before opening Phase 1 implementation branches.
|
||||
|
||||
### Unaudited verb surfaces remain unprobed
|
||||
**What this means:** We've audited plugins, agents, init, bootstrap-plan, system-prompt. Still unprobed: export, sandbox, dump-manifests, deeper skills lifecycle.
|
||||
**Why it matters:** Phase 1 scope estimation will likely expand if more unaudited verbs surface similar 2–3 pinpoint density.
|
||||
**Action:** Cycles #106+ will continue probing unaudited surfaces. Phase 1 sequence adjusts if new families emerge.
|
||||
|
||||
---
|
||||
|
||||
## Reviewer Checkpoints
|
||||
|
||||
**Before approving:**
|
||||
1. ✅ Do the Phase 0 commits actually deliver what they claim? (Test coverage, routing changes, guard logic)
|
||||
2. ✅ Is the SCHEMAS.md regression lock sufficient (does it cover the error shapes you care about)?
|
||||
3. ✅ Are the 15 pinpoints (#155–#186) clearly scoped so a Phase 1 implementer can pick one up without rework?
|
||||
4. ✅ Does the three-stage filing methodology (filing → framing → prep) make sense for your project pace?
|
||||
5. ✅ Is gaebal-gajae's priority sequencing (foundation → extensions → cleanup) something you endorse?
|
||||
|
||||
**Before squashing/fast-forwarding:**
|
||||
1. ✅ No outstanding merge conflicts with main
|
||||
2. ✅ All 227 tests pass on main (not just this branch)
|
||||
3. ✅ No style drift (fmt + clippy clean)
|
||||
|
||||
**After merge:**
|
||||
1. ✅ Tag the merge commit as `phase-0-complete` for easy reference
|
||||
2. ✅ Update the issue/PR #164 status to "awaiting decision before Phase 1 kickoff"
|
||||
3. ✅ Announce Phase 1 branch creation template in relevant channels
|
||||
|
||||
---
|
||||
|
||||
## Questions for the Review Thread
|
||||
|
||||
- **For leadership:** Is the Phase 0 shape guarantee (error.kind + error.operation + error.target + error.hint always together) a contract we want to support for 2+ major versions?
|
||||
- **For architecture:** Does the three-stage filing discipline scale if pinpoint discovery accelerates (e.g. 10+ new gaps per cycle)?
|
||||
- **For product:** Should the SCHEMAS.md version be bumped to 2.1 after Phase 0 lands to signal the new guarantees?
|
||||
|
||||
---
|
||||
|
||||
## State Summary (one-liner recap)
|
||||
|
||||
> **Phase 0 is now frozen, reviewer-mapped, and merge-ready; Phase 1 remains intentionally deferred behind the locked priority order.**
|
||||
|
||||
---
|
||||
|
||||
**Branch ready for review. Awaiting approval + merge signal.**
|
||||
87
CYCLE_99_CHECKPOINT.md
Normal file
87
CYCLE_99_CHECKPOINT.md
Normal file
@@ -0,0 +1,87 @@
|
||||
# Cycle #99 Checkpoint: Bundle Status & Phase 1 Readiness (2026-04-23 08:53 Seoul)
|
||||
|
||||
## Active Branch Status
|
||||
|
||||
**Branch:** `feat/jobdori-168c-emission-routing`
|
||||
**Commits:** 15 (since Phase 0 start at cycle #89)
|
||||
**Tests:** 227/227 pass (cumulative green run, zero regressions)
|
||||
**Axes of work:** 5
|
||||
|
||||
### Work Axes Breakdown
|
||||
|
||||
| Axis | Pinpoints | Cycles | Status |
|
||||
|---|---|---|---|
|
||||
| **Emission** (Phase 0) | #168c | #89-#92 | ✅ COMPLETE (4 tasks) |
|
||||
| **Discoverability** | #155, #153 | #93.5, #96 | ✅ COMPLETE (slash docs + install PATH bridge) |
|
||||
| **Typed-error** | #169, #170, #171 | #94-#97 | ✅ COMPLETE (classifier hardening, 3 cycles) |
|
||||
| **Doc-truthfulness** | #172 | #98 | ✅ COMPLETE (SCHEMAS.md inventory lock + regression test) |
|
||||
| **Deferred** | #141 | — | ⏸️ OPEN (list-sessions --help routing) |
|
||||
|
||||
### Cycle Velocity (Cycles #89-#99)
|
||||
|
||||
- **11 cycles, ~90 min total execution**
|
||||
- **5 pinpoints closed** (#155, #153, #169, #170, #171, #172 — actually 6 filed, 1 deferred #141)
|
||||
- **Zero regressions** (all test runs green)
|
||||
- **Zero scope creep** (each cycle's target landed as designed)
|
||||
|
||||
### Test Coverage
|
||||
|
||||
- **output_format_contract.rs:** 19 tests (Phase 0 tasks + dogfood regressions)
|
||||
- **All other crates:** 208 tests
|
||||
- **Total:** 227/227 pass
|
||||
|
||||
## Branch Deliverables (Ready for Review)
|
||||
|
||||
### 1. Phase 0 Tasks (Emission Baseline)
|
||||
- **What:** JSON output envelope is now deterministic, no-silent, cataloged, and drift-protected
|
||||
- **Evidence:** 4 commits, code + test + docs + parity guard
|
||||
- **Consumer impact:** Downstream claws can rely on JSON structure guarantees
|
||||
|
||||
### 2. Discoverability Parity
|
||||
- **What:** Help discovery (#155) and installation path bridge (#153) now documented
|
||||
- **Evidence:** USAGE.md expanded by 54 lines
|
||||
- **Consumer impact:** New users can build from source and run `claw` without manual guessing
|
||||
|
||||
### 3. Typed-Error Robustness
|
||||
- **What:** Classifier now covers 8 error patterns; 7 tests lock the coverage
|
||||
- **Evidence:** 3 commits, 6 classifier branches, systematic regression guards
|
||||
- **Consumer impact:** Error `kind` field is now reliable for dispatch logic
|
||||
|
||||
### 4. Doc-Truthfulness Lock
|
||||
- **What:** SCHEMAS.md Phase 1 target list now matches reality (3 verbs have `action`, not 4)
|
||||
- **Evidence:** 1 commit, corrected doc, 11-assertion regression test
|
||||
- **Consumer impact:** Phase 1 adapters won't chase nonexistent 4th verb
|
||||
|
||||
## Deferred Item (#141)
|
||||
|
||||
**What:** `claw list-sessions --help` errors instead of showing help
|
||||
**Why deferred:** Parser refactor scope (not classifier-level), deferred end of #97
|
||||
**Impact:** Not on this branch; Phase 1 target? Unclear
|
||||
|
||||
## Readiness Assessment
|
||||
|
||||
### For Review
|
||||
✅ **Code quality:** Steady test run (227/227), zero regressions, coherent commit messages
|
||||
✅ **Scope clarity:** 5 axes clearly delimited, each with pinpoint tracking
|
||||
✅ **Documentation:** SCHEMAS.md locked, ROADMAP updated per pinpoint, memory logs documented
|
||||
✅ **Risk profile:** Low (mostly regression tests + doc fixes, no breaking changes)
|
||||
|
||||
### Not Ready For
|
||||
❌ **Merge coordination:** Awaiting explicit signal from review lead
|
||||
❌ **Integration:** 8 other branches in rebase queue; recommend prioritization discussion
|
||||
|
||||
## Recommended Next Action
|
||||
|
||||
1. **Push branch for review** (when review queue capacity available)
|
||||
2. **Or file Phase 1 design decision** (#164 Option A vs B) if higher priority
|
||||
3. **Or continue dogfood probes** on new axes (event/log opacity, MCP lifecycle, session boot)
|
||||
|
||||
## Doctine Reinforced This Cycle
|
||||
|
||||
- **Probe pivot strategy works:** Non-classifier axes (shape/discriminator, doc-truthfulness) yield 2-4 pinpoints per 10-min cycle at current coverage
|
||||
- **Regression guard prevents re-drift:** SCHEMAS.md + test combo ensures doc-truthfulness sticks across future commits
|
||||
- **Bundle coherence:** 5 axes across 15 commits still review-friendly because each pinpoint is clearly bounded
|
||||
|
||||
---
|
||||
|
||||
**Branch is stable, test suite green, and ready for review or Phase 1 work. Checkpoint filed for arc continuity.**
|
||||
512
ERROR_HANDLING.md
Normal file
512
ERROR_HANDLING.md
Normal file
@@ -0,0 +1,512 @@
|
||||
# Error Handling for Claw Code Claws
|
||||
|
||||
**Purpose:** Build a unified error handler for orchestration code using claw-code as a library or subprocess.
|
||||
|
||||
After cycles #178–#179 (parser-front-door hole closure), claw-code's error interface is deterministic, machine-readable, and clawable: **one error handler for all 14 clawable commands.**
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference: Exit Codes and Envelopes
|
||||
|
||||
Every clawable command returns JSON on stdout when `--output-format json` is requested.
|
||||
|
||||
**IMPORTANT:** The exit code contract below applies **only when `--output-format json` is explicitly set**. Text mode follows argparse conventions and may return different exit codes (e.g., `2` for argparse parse errors). Claws consuming claw-code as a subprocess MUST always pass `--output-format json` to get the documented contract.
|
||||
|
||||
| Exit Code | Meaning | Response Format | Example |
|
||||
|---|---|---|---|
|
||||
| **0** | Success | `{success fields}` | `{"session_id": "...", "loaded": true}` |
|
||||
| **1** | Error / Not Found | `{error: "...", hint: "...", kind: "...", type: "error"}` (flat, v1.0) | `{"error": "session not found", "kind": "session_not_found", "type": "error"}` |
|
||||
| **2** | Timeout | `{final_stop_reason: "timeout", final_cancel_observed: ...}` | `{"final_stop_reason": "timeout", ...}` |
|
||||
|
||||
### Text mode vs JSON mode exit codes
|
||||
|
||||
| Scenario | Text mode exit | JSON mode exit | Why |
|
||||
|---|---|---|---|
|
||||
| Unknown subcommand | 2 (argparse default) | 1 (parse error envelope) | argparse defaults to 2; JSON mode normalizes to contract |
|
||||
| Missing required arg | 2 (argparse default) | 1 (parse error envelope) | Same reason |
|
||||
| Session not found | 1 | 1 | Application-level error, same in both |
|
||||
| Command executed OK | 0 | 0 | Success path, identical |
|
||||
| Turn-loop timeout | 2 | 2 | Identical (#161 implementation) |
|
||||
|
||||
**Practical rule for claws:** always pass `--output-format json`. This eliminates text-mode surprises and gives you the documented exit-code contract for every error path.
|
||||
|
||||
---
|
||||
|
||||
## One-Handler Pattern
|
||||
|
||||
Build a single error-recovery function that works for all 14 clawable commands:
|
||||
|
||||
```python
|
||||
import subprocess
|
||||
import json
|
||||
import sys
|
||||
from typing import Any
|
||||
|
||||
def run_claw_command(command: list[str], timeout_seconds: float = 30.0) -> dict[str, Any]:
|
||||
"""
|
||||
Run a clawable claw-code command and handle errors uniformly.
|
||||
|
||||
Args:
|
||||
command: Full command list, e.g. ["claw", "load-session", "id", "--output-format", "json"]
|
||||
timeout_seconds: Wall-clock timeout
|
||||
|
||||
Returns:
|
||||
Parsed JSON result from stdout
|
||||
|
||||
Raises:
|
||||
ClawError: Classified by error.kind (parse, session_not_found, runtime, timeout, etc.)
|
||||
"""
|
||||
try:
|
||||
result = subprocess.run(
|
||||
command,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=timeout_seconds,
|
||||
)
|
||||
except subprocess.TimeoutExpired:
|
||||
raise ClawError(
|
||||
kind='subprocess_timeout',
|
||||
message=f'Command exceeded {timeout_seconds}s wall-clock timeout',
|
||||
retryable=True, # Caller's decision; subprocess timeout != engine timeout
|
||||
)
|
||||
|
||||
# Parse JSON (valid for all success/error/timeout paths in claw-code)
|
||||
try:
|
||||
envelope = json.loads(result.stdout)
|
||||
except json.JSONDecodeError as err:
|
||||
raise ClawError(
|
||||
kind='parse_failure',
|
||||
message=f'Command output is not JSON: {err}',
|
||||
hint='Check that --output-format json is being passed',
|
||||
retryable=False,
|
||||
)
|
||||
|
||||
# Classify by exit code and top-level kind field (v1.0 flat envelope shape)
|
||||
# NOTE: v1.0 envelopes have error as a STRING, not a nested object.
|
||||
# The v2.0 schema (SCHEMAS.md) specifies nested error.{kind, message, ...},
|
||||
# but the current binary emits flat {error: "...", kind: "...", type: "error"}.
|
||||
# See FIX_LOCUS_164.md for the migration timeline.
|
||||
match (result.returncode, envelope.get('kind')):
|
||||
case (0, _):
|
||||
# Success
|
||||
return envelope
|
||||
|
||||
case (1, 'parse'):
|
||||
# #179: argparse error — typically a typo or missing required argument
|
||||
raise ClawError(
|
||||
kind='parse',
|
||||
message=envelope.get('error', ''), # error field is a string in v1.0
|
||||
hint=envelope.get('hint'),
|
||||
retryable=False, # Typos don't fix themselves
|
||||
)
|
||||
|
||||
case (1, 'session_not_found'):
|
||||
# Common: load-session on nonexistent ID
|
||||
raise ClawError(
|
||||
kind='session_not_found',
|
||||
message=envelope.get('error', ''), # error field is a string in v1.0
|
||||
session_id=envelope.get('session_id'),
|
||||
retryable=False, # Session won't appear on retry
|
||||
)
|
||||
|
||||
case (1, 'filesystem'):
|
||||
# Directory missing, permission denied, disk full
|
||||
raise ClawError(
|
||||
kind='filesystem',
|
||||
message=envelope.get('error', ''), # error field is a string in v1.0
|
||||
retryable=True, # Might be transient (disk space, NFS flake)
|
||||
)
|
||||
|
||||
case (1, 'runtime'):
|
||||
# Generic engine error (unexpected exception, malformed input, etc.)
|
||||
raise ClawError(
|
||||
kind='runtime',
|
||||
message=envelope.get('error', ''), # error field is a string in v1.0
|
||||
retryable=envelope.get('retryable', False), # v1.0 may or may not have this
|
||||
)
|
||||
|
||||
case (1, _):
|
||||
# Catch-all for any new error.kind values
|
||||
raise ClawError(
|
||||
kind=envelope.get('kind', 'unknown'),
|
||||
message=envelope.get('error', ''), # error field is a string in v1.0
|
||||
retryable=envelope.get('retryable', False), # v1.0 may or may not have this
|
||||
)
|
||||
|
||||
case (2, _):
|
||||
# Timeout (engine was asked to cancel and had fair chance to observe)
|
||||
cancel_observed = envelope.get('final_cancel_observed', False)
|
||||
raise ClawError(
|
||||
kind='timeout',
|
||||
message=f'Turn exceeded timeout (cancel_observed={cancel_observed})',
|
||||
cancel_observed=cancel_observed,
|
||||
retryable=True, # Caller can retry with a fresh session
|
||||
safe_to_reuse_session=(cancel_observed is True),
|
||||
)
|
||||
|
||||
case (exit_code, _):
|
||||
# Unexpected exit code
|
||||
raise ClawError(
|
||||
kind='unexpected_exit_code',
|
||||
message=f'Unexpected exit code {exit_code}',
|
||||
retryable=False,
|
||||
)
|
||||
|
||||
|
||||
class ClawError(Exception):
|
||||
"""Unified error type for claw-code commands."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
kind: str,
|
||||
message: str,
|
||||
hint: str | None = None,
|
||||
retryable: bool = False,
|
||||
cancel_observed: bool = False,
|
||||
safe_to_reuse_session: bool = False,
|
||||
session_id: str | None = None,
|
||||
):
|
||||
self.kind = kind
|
||||
self.message = message
|
||||
self.hint = hint
|
||||
self.retryable = retryable
|
||||
self.cancel_observed = cancel_observed
|
||||
self.safe_to_reuse_session = safe_to_reuse_session
|
||||
self.session_id = session_id
|
||||
super().__init__(self.message)
|
||||
|
||||
def __str__(self) -> str:
|
||||
parts = [f"{self.kind}: {self.message}"]
|
||||
if self.hint:
|
||||
parts.append(f"Hint: {self.hint}")
|
||||
if self.retryable:
|
||||
parts.append("(retryable)")
|
||||
if self.cancel_observed:
|
||||
parts.append(f"(safe_to_reuse_session={self.safe_to_reuse_session})")
|
||||
return "\n".join(parts)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Practical Recovery Patterns
|
||||
|
||||
### Pattern 1: Retry on transient errors
|
||||
|
||||
```python
|
||||
from time import sleep
|
||||
|
||||
def run_with_retry(
|
||||
command: list[str],
|
||||
max_attempts: int = 3,
|
||||
backoff_seconds: float = 0.5,
|
||||
) -> dict:
|
||||
"""Retry on transient errors (filesystem, timeout)."""
|
||||
for attempt in range(1, max_attempts + 1):
|
||||
try:
|
||||
return run_claw_command(command)
|
||||
except ClawError as err:
|
||||
if not err.retryable:
|
||||
raise # Non-transient; fail fast
|
||||
|
||||
if attempt == max_attempts:
|
||||
raise # Last attempt; propagate
|
||||
|
||||
print(f"Attempt {attempt} failed ({err.kind}); retrying in {backoff_seconds}s...", file=sys.stderr)
|
||||
sleep(backoff_seconds)
|
||||
backoff_seconds *= 1.5 # exponential backoff
|
||||
|
||||
raise RuntimeError("Unreachable")
|
||||
```
|
||||
|
||||
### Pattern 2: Reuse session after timeout (if safe)
|
||||
|
||||
```python
|
||||
def run_with_timeout_recovery(
|
||||
command: list[str],
|
||||
timeout_seconds: float = 30.0,
|
||||
fallback_timeout: float = 60.0,
|
||||
) -> dict:
|
||||
"""
|
||||
On timeout, check cancel_observed. If True, the session is safe for retry.
|
||||
If False, the session is potentially wedged; use a fresh one.
|
||||
"""
|
||||
try:
|
||||
return run_claw_command(command, timeout_seconds=timeout_seconds)
|
||||
except ClawError as err:
|
||||
if err.kind != 'timeout':
|
||||
raise
|
||||
|
||||
if err.safe_to_reuse_session:
|
||||
# Engine saw the cancel signal; safe to reuse this session with a larger timeout
|
||||
print(f"Timeout observed (cancel_observed=true); retrying with {fallback_timeout}s...", file=sys.stderr)
|
||||
return run_claw_command(command, timeout_seconds=fallback_timeout)
|
||||
else:
|
||||
# Engine didn't see the cancel signal; session may be wedged
|
||||
print(f"Timeout not observed (cancel_observed=false); session is potentially wedged", file=sys.stderr)
|
||||
raise # Caller should allocate a fresh session
|
||||
```
|
||||
|
||||
### Pattern 3: Detect parse errors (typos in command-line construction)
|
||||
|
||||
```python
|
||||
def validate_command_before_dispatch(command: list[str]) -> None:
|
||||
"""
|
||||
Dry-run with --help to detect obvious syntax errors before dispatching work.
|
||||
|
||||
This is cheap (no API call) and catches typos like:
|
||||
- Unknown subcommand: `claw typo-command`
|
||||
- Unknown flag: `claw bootstrap --invalid-flag`
|
||||
- Missing required argument: `claw load-session` (no session_id)
|
||||
"""
|
||||
help_cmd = command + ['--help']
|
||||
try:
|
||||
result = subprocess.run(help_cmd, capture_output=True, timeout=2.0)
|
||||
if result.returncode != 0:
|
||||
print(f"Warning: {' '.join(help_cmd)} returned {result.returncode}", file=sys.stderr)
|
||||
print("(This doesn't prove the command is invalid, just that --help failed)", file=sys.stderr)
|
||||
except subprocess.TimeoutExpired:
|
||||
pass # --help shouldn't hang, but don't block on it
|
||||
```
|
||||
|
||||
### Pattern 4: Log and forward errors to observability
|
||||
|
||||
```python
|
||||
import logging
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
def run_claw_with_logging(command: list[str]) -> dict:
|
||||
"""Run command and log errors for observability."""
|
||||
try:
|
||||
result = run_claw_command(command)
|
||||
logger.info(f"Claw command succeeded: {' '.join(command)}")
|
||||
return result
|
||||
except ClawError as err:
|
||||
logger.error(
|
||||
"Claw command failed",
|
||||
extra={
|
||||
'command': ' '.join(command),
|
||||
'error_kind': err.kind,
|
||||
'error_message': err.message,
|
||||
'retryable': err.retryable,
|
||||
'cancel_observed': err.cancel_observed,
|
||||
},
|
||||
)
|
||||
raise
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Error Kinds (Enumeration)
|
||||
|
||||
After cycles #178–#179, the complete set of `error.kind` values is:
|
||||
|
||||
| Kind | Exit Code | Meaning | Retryable | Notes |
|
||||
|---|---|---|---|---|
|
||||
| **parse** | 1 | Argparse error (unknown command, missing arg, invalid flag) | No | Real error message included (#179); valid choices list for discoverability |
|
||||
| **session_not_found** | 1 | load-session target doesn't exist | No | session_id and directory included in envelope |
|
||||
| **filesystem** | 1 | Directory missing, permission denied, disk full | Yes | Transient issues (disk space, NFS flake) can be retried |
|
||||
| **runtime** | 1 | Engine error (unexpected exception, malformed input) | Depends | `error.retryable` field in envelope specifies |
|
||||
| **timeout** | 2 | Engine timeout with cooperative cancellation | Yes* | `cancel_observed` field signals session safety (#164) |
|
||||
|
||||
*Retry safety depends on `cancel_observed`:
|
||||
- `cancel_observed=true` → session is safe to reuse
|
||||
- `cancel_observed=false` → session may be wedged; allocate fresh one
|
||||
|
||||
---
|
||||
|
||||
## What We Did to Make This Work
|
||||
|
||||
### Cycle #178: Parse-Error Envelope
|
||||
|
||||
**Problem:** `claw nonexistent --output-format json` returned argparse help text on stderr instead of an envelope.
|
||||
**Solution:** Catch argparse `SystemExit` in JSON mode and emit a structured error envelope.
|
||||
**Benefit:** Claws no longer need to parse human help text to understand parse errors.
|
||||
|
||||
### Cycle #179: Stderr Hygiene + Real Error Message
|
||||
|
||||
**Problem:** Even after #178, argparse usage was leaking to stderr AND the envelope message was generic ("invalid command or argument").
|
||||
**Solution:** Monkey-patch `parser.error()` in JSON mode to raise an internal exception, preserving argparse's real message verbatim. Suppress stderr entirely in JSON mode.
|
||||
**Benefit:** Claws see one stream (stdout), one envelope, and real error context (e.g., "invalid choice: typo (choose from ...)") for discoverability.
|
||||
|
||||
### Contract: #164 Stage B (`cancel_observed` field)
|
||||
|
||||
**Problem:** Timeout results didn't signal whether the engine actually observed the cancellation request.
|
||||
**Solution:** Add `cancel_observed: bool` field to timeout TurnResult; signal true iff the engine had a fair chance to observe the cancel event.
|
||||
**Benefit:** Claws can decide "retry with fresh session" vs "reuse this session with larger timeout" based on a single boolean.
|
||||
|
||||
---
|
||||
|
||||
## Common Mistakes to Avoid
|
||||
|
||||
❌ **Don't parse exit code alone**
|
||||
```python
|
||||
# BAD: Exit code 1 could mean parse error, not-found, filesystem, or runtime
|
||||
if result.returncode == 1:
|
||||
# What should I do? Unclear.
|
||||
pass
|
||||
```
|
||||
|
||||
✅ **Do parse error.kind**
|
||||
```python
|
||||
# GOOD: error.kind tells you exactly how to recover
|
||||
match envelope['error']['kind']:
|
||||
case 'parse': ...
|
||||
case 'session_not_found': ...
|
||||
case 'filesystem': ...
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
❌ **Don't capture both stdout and stderr and assume they're separate concerns**
|
||||
```python
|
||||
# BAD (pre-#179): Capture stdout + stderr, then parse stdout as JSON
|
||||
# But stderr might contain argparse noise that you have to string-match
|
||||
result = subprocess.run(..., capture_output=True, text=True)
|
||||
if "invalid choice" in result.stderr:
|
||||
# ... custom error handling
|
||||
```
|
||||
|
||||
✅ **Do silence stderr in JSON mode**
|
||||
```python
|
||||
# GOOD (post-#179): In JSON mode, stderr is guaranteed silent
|
||||
# Envelope on stdout is your single source of truth
|
||||
result = subprocess.run(..., capture_output=True, text=True)
|
||||
envelope = json.loads(result.stdout) # Always valid in JSON mode
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
❌ **Don't retry on parse errors**
|
||||
```python
|
||||
# BAD: Typos don't fix themselves
|
||||
error_kind = envelope['error']['kind']
|
||||
if error_kind == 'parse':
|
||||
retry() # Will fail again
|
||||
```
|
||||
|
||||
✅ **Do check retryable before retrying**
|
||||
```python
|
||||
# GOOD: Let the error tell you
|
||||
error = envelope['error']
|
||||
if error.get('retryable', False):
|
||||
retry()
|
||||
else:
|
||||
raise
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
❌ **Don't reuse a session after timeout without checking cancel_observed**
|
||||
```python
|
||||
# BAD: Reuse session = potential wedge
|
||||
result = run_claw_command(...) # times out
|
||||
# ... later, reuse same session
|
||||
result = run_claw_command(...) # might be stuck in the previous turn
|
||||
```
|
||||
|
||||
✅ **Do allocate a fresh session if cancel_observed=false**
|
||||
```python
|
||||
# GOOD: Allocate fresh session if wedge is suspected
|
||||
try:
|
||||
result = run_claw_command(...)
|
||||
except ClawError as err:
|
||||
if err.cancel_observed:
|
||||
# Safe to reuse
|
||||
result = run_claw_command(...)
|
||||
else:
|
||||
# Allocate fresh session
|
||||
fresh_session = create_session()
|
||||
result = run_claw_command_in_session(fresh_session, ...)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Testing Your Error Handler
|
||||
|
||||
```python
|
||||
def test_error_handler_parse_error():
|
||||
"""Verify parse errors are caught and classified."""
|
||||
try:
|
||||
run_claw_command(['claw', 'nonexistent', '--output-format', 'json'])
|
||||
assert False, "Should have raised ClawError"
|
||||
except ClawError as err:
|
||||
assert err.kind == 'parse'
|
||||
assert 'invalid choice' in err.message.lower()
|
||||
assert err.retryable is False
|
||||
|
||||
def test_error_handler_timeout_safe():
|
||||
"""Verify timeout with cancel_observed=true marks session as safe."""
|
||||
# Requires a live claw-code server; mock this test
|
||||
try:
|
||||
run_claw_command(
|
||||
['claw', 'turn-loop', '"x"', '--timeout-seconds', '0.0001'],
|
||||
timeout_seconds=2.0,
|
||||
)
|
||||
assert False, "Should have raised ClawError"
|
||||
except ClawError as err:
|
||||
assert err.kind == 'timeout'
|
||||
assert err.safe_to_reuse_session is True # cancel_observed=true
|
||||
|
||||
def test_error_handler_not_found():
|
||||
"""Verify session_not_found is clearly classified."""
|
||||
try:
|
||||
run_claw_command(['claw', 'load-session', 'nonexistent', '--output-format', 'json'])
|
||||
assert False, "Should have raised ClawError"
|
||||
except ClawError as err:
|
||||
assert err.kind == 'session_not_found'
|
||||
assert err.retryable is False
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Appendix A: v1.0 Error Envelope (Current Binary)
|
||||
|
||||
The actual shape emitted by the current binary (v1.0, flat):
|
||||
|
||||
```json
|
||||
{
|
||||
"error": "session 'nonexistent' not found in .claw/sessions",
|
||||
"hint": "use 'list-sessions' to see available sessions",
|
||||
"kind": "session_not_found",
|
||||
"type": "error"
|
||||
}
|
||||
```
|
||||
|
||||
**Key differences from v2.0 schema (below):**
|
||||
- `error` field is a **string**, not a structured object
|
||||
- `kind` is at **top-level**, not nested under `error`
|
||||
- Missing: `timestamp`, `command`, `exit_code`, `output_format`, `schema_version`
|
||||
- Extra: `type: "error"` field (not in schema)
|
||||
|
||||
## Appendix B: SCHEMAS.md Target Shape (v2.0)
|
||||
|
||||
For reference, the target JSON error envelope shape (SCHEMAS.md, v2.0):
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-04-22T11:40:00Z",
|
||||
"command": "load-session",
|
||||
"exit_code": 1,
|
||||
"output_format": "json",
|
||||
"schema_version": "2.0",
|
||||
"error": {
|
||||
"kind": "session_not_found",
|
||||
"operation": "session_store.load_session",
|
||||
"target": "nonexistent",
|
||||
"retryable": false,
|
||||
"message": "session 'nonexistent' not found in .port_sessions",
|
||||
"hint": "use 'list-sessions' to see available sessions"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**This is the target schema after [`FIX_LOCUS_164`](./FIX_LOCUS_164.md) is implemented.** The migration plan includes a dual-mode `--envelope-version=2.0` flag in Phase 1, default version bump in Phase 2, and deprecation in Phase 3. For now, code against v1.0 (Appendix A).
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
After cycles #178–#179, **one error handler works for all 14 clawable commands.** No more string-matching, no more stderr parsing, no more exit-code ambiguity. Just parse the JSON, check `error.kind`, and decide: retry, escalate, or reuse session (if safe).
|
||||
|
||||
The handler itself is ~80 lines of Python; the patterns are reusable across any language that can speak JSON.
|
||||
364
FIX_LOCUS_164.md
Normal file
364
FIX_LOCUS_164.md
Normal file
@@ -0,0 +1,364 @@
|
||||
# Fix-Locus #164 — JSON Envelope Contract Migration
|
||||
|
||||
**Status:** 📋 Proposed (2026-04-23, cycle #77). Updated cycle #85 (2026-04-23) with v1.5 baseline phase after fresh-dogfood discovery (#168) proved v1.0 was never coherent.
|
||||
|
||||
**Class:** Contract migration (not a patch). Affects EVERY `--output-format json` command.
|
||||
|
||||
**Bundle:** Typed-error family — joins #102 + #121 + #127 + #129 + #130 + #245 + **#164**. Contract-level implementation of §4.44 typed-error envelope.
|
||||
|
||||
---
|
||||
|
||||
## 0. CRITICAL UPDATE (Cycle #85 via #168 Evidence)
|
||||
|
||||
**Premise revision:** This locus document originally framed the problem as **"v1.0 (incoherent) → v2.0 (target schema)"** migration. **Fresh-dogfood validation in cycle #84 proved this framing was underspecified.**
|
||||
|
||||
**Actual problem (evidence from #168):**
|
||||
|
||||
- There is **no coherent v1.0 envelope contract**. Each verb has a bespoke JSON shape.
|
||||
- `claw list-sessions --output-format json` emits `{command, sessions}` — has `command` field
|
||||
- `claw doctor --output-format json` emits `{checks, kind, message, ...}` — no `command` field
|
||||
- `claw bootstrap hello --output-format json` emits **NOTHING** (silent failure with exit 0)
|
||||
- Each verb renderer was written independently with no coordinating contract
|
||||
|
||||
**Revised migration plan — three phases instead of two:**
|
||||
|
||||
1. **Phase 0 (Emergency):** Fix silent failures (#168 bootstrap JSON). Every `--output-format json` command must emit valid JSON.
|
||||
2. **Phase 1 (v1.5 Baseline):** Establish minimal JSON invariants across all 14 verbs without breaking existing consumers:
|
||||
- Every command emits valid JSON when `--output-format json` is passed
|
||||
- Every command has a top-level `kind` field identifying the verb
|
||||
- Every error envelope follows the confirmed `{error, hint, kind, type}` shape
|
||||
- Every success envelope has the verb name in a predictable location
|
||||
- **Effort:** ~3 dev-days (no new design, just fill gaps and normalize bugs)
|
||||
3. **Phase 2 (v2.0 Wrapped Envelope):** Execute the original Phase 1 plan documented below — common metadata wrapper, nested data/error objects, opt-in via `--envelope-version=2.0`.
|
||||
4. **Phase 3 (v2.0 Default):** Original Phase 2 plan below.
|
||||
5. **Phase 4 (v1.0/v1.5 Deprecation):** Original Phase 3 plan below.
|
||||
|
||||
**Why add Phase 0 + Phase 1 (v1.5)?**
|
||||
|
||||
- You can't migrate from "incoherent" to "coherent v2.0" in one jump. Intermediate coherence (v1.5 baseline) is required.
|
||||
- Consumer code built against "whatever v1 emits today" needs a stable target to transition from.
|
||||
- **Silent failures (bootstrap JSON) must be fixed BEFORE any migration** — otherwise consumers have no way to detect breakage.
|
||||
|
||||
**Blocker resolved:** The original blocker "v1.0 design vs v2.0 design" is actually "no v1 design exists; let's make one (v1.5) then migrate." This is a **clearer, lower-risk migration path**.
|
||||
|
||||
**Revised effort estimate:** ~9 dev-days total (Phase 0: 1 day + Phase 1/v1.5: 3 days + Phase 2/v2.0: 5 days) instead of ~6 dev-days for a direct v1.0→v2.0 migration (which would have failed given the incoherent baseline).
|
||||
|
||||
**Doctrine implication:** Cycles #76–#82 diagnosed "aspirational vs current" correctly but missed that "current" was never a single thing. Cycle #84 fresh-dogfood caught this. **Fresh-dogfood discipline (principle #9) prevented a 6-day migration effort from hitting an unsolvable baseline problem.**
|
||||
|
||||
---
|
||||
|
||||
## 1. Scope — What This Migration Affects
|
||||
|
||||
**Every JSON-emitting verb.** Audit across the 14 documented verbs:
|
||||
|
||||
| Verb | Current top-level keys | Schema-conformant? |
|
||||
|---|---|---|
|
||||
| `doctor` | checks, has_failures, **kind**, message, report, summary | ❌ No (kind=verb-id, flat) |
|
||||
| `status` | config_load_error, **kind**, model, ..., workspace | ❌ No |
|
||||
| `version` | git_sha, **kind**, message, target, version | ❌ No |
|
||||
| `sandbox` | active, ..., **kind**, ...supported | ❌ No |
|
||||
| `help` | **kind**, message | ❌ No (minimal) |
|
||||
| `agents` | action, agents, count, **kind**, summary, working_directory | ❌ No |
|
||||
| `mcp` | action, config_load_error, ..., **kind**, servers | ❌ No |
|
||||
| `skills` | action, **kind**, skills, summary | ❌ No |
|
||||
| `system-prompt` | **kind**, message, sections | ❌ No |
|
||||
| `dump-manifests` | error, hint, **kind**, type | ❌ No (emits error envelope for success) |
|
||||
| `bootstrap-plan` | **kind**, phases | ❌ No |
|
||||
| `acp` | aliases, ..., **kind**, ...tracking | ❌ No |
|
||||
| `export` | file, **kind**, markdown, messages, session_id | ❌ No |
|
||||
| `state` | error, hint, **kind**, type | ❌ No (emits error envelope for success) |
|
||||
|
||||
**All 14 verbs diverge from SCHEMAS.md.** The gap is 100%, not a partial drift.
|
||||
|
||||
---
|
||||
|
||||
## 2. The Two Envelope Shapes
|
||||
|
||||
### 2a. Current Binary Shape (Flat Top-Level)
|
||||
|
||||
```json
|
||||
// Success example (claw doctor --output-format json)
|
||||
{
|
||||
"kind": "doctor", // verb identity
|
||||
"checks": [...],
|
||||
"summary": {...},
|
||||
"has_failures": false,
|
||||
"report": "...",
|
||||
"message": "..."
|
||||
}
|
||||
|
||||
// Error example (claw doctor foo --output-format json)
|
||||
{
|
||||
"error": "unrecognized argument...", // string, not object
|
||||
"hint": "Run `claw --help` for usage.",
|
||||
"kind": "cli_parse", // error classification (overloaded)
|
||||
"type": "error" // not in schema
|
||||
}
|
||||
```
|
||||
|
||||
**Properties:**
|
||||
- Flat top-level
|
||||
- `kind` field is **overloaded** (verb-id in success, error-class in error)
|
||||
- No common wrapper metadata (timestamp, exit_code, schema_version)
|
||||
- `error` is a string, not a structured object
|
||||
|
||||
### 2b. Documented Schema Shape (Nested, Wrapped)
|
||||
|
||||
```json
|
||||
// Success example (per SCHEMAS.md)
|
||||
{
|
||||
"timestamp": "2026-04-22T10:10:00Z",
|
||||
"command": "doctor",
|
||||
"exit_code": 0,
|
||||
"output_format": "json",
|
||||
"schema_version": "1.0",
|
||||
"data": {
|
||||
"checks": [...],
|
||||
"summary": {...},
|
||||
"has_failures": false
|
||||
}
|
||||
}
|
||||
|
||||
// Error example (per SCHEMAS.md)
|
||||
{
|
||||
"timestamp": "2026-04-22T10:10:00Z",
|
||||
"command": "doctor",
|
||||
"exit_code": 1,
|
||||
"output_format": "json",
|
||||
"schema_version": "1.0",
|
||||
"error": {
|
||||
"kind": "parse", // enum, nested
|
||||
"operation": "parse_args",
|
||||
"target": "subcommand `doctor`",
|
||||
"retryable": false,
|
||||
"message": "unrecognized argument...",
|
||||
"hint": "Run `claw --help` for usage."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Properties:**
|
||||
- Common metadata wrapper (timestamp, command, exit_code, output_format, schema_version)
|
||||
- `data` (payload) vs. `error` (failure) as **sibling fields**, never coexisting
|
||||
- `kind` in error is the enum from §4.44 (filesystem/auth/session/parse/runtime/mcp/delivery/usage/policy/unknown)
|
||||
- `error` is a structured object with operation/target/retryable
|
||||
|
||||
---
|
||||
|
||||
## 3. Migration Strategy — Phased Rollout
|
||||
|
||||
**Principle:** Don't break downstream consumers mid-migration. Support both shapes during overlap, then deprecate.
|
||||
|
||||
### Phase 1 — Dual-Envelope Mode (Opt-In)
|
||||
|
||||
**Deliverables:**
|
||||
- New flag: `--envelope-version=2.0` (or `--schema-version=2.0`)
|
||||
- When flag set: emit new (schema-conformant) envelope
|
||||
- When flag absent: emit current (flat) envelope
|
||||
- SCHEMAS.md: add "Legacy (v1.0)" section documenting current flat shape alongside v2.0
|
||||
|
||||
**Implementation:**
|
||||
- Single `envelope_version` parameter in `CliOutputFormat` enum
|
||||
- Every verb's JSON writer checks version, branches accordingly
|
||||
- Shared wrapper helper: `wrap_v2(payload, command, exit_code)`
|
||||
|
||||
**Consumer impact:** Opt-in. Existing consumers unchanged. New consumers can opt in.
|
||||
|
||||
**Timeline estimate:** ~2 days for 14 verbs + shared wrapper + tests.
|
||||
|
||||
### Phase 2 — Default Version Bump
|
||||
|
||||
**Deliverables:**
|
||||
- Default changes from v1.0 → v2.0
|
||||
- New flag: `--legacy-envelope` to opt back into flat shape
|
||||
- Migration guide added to SCHEMAS.md and CHANGELOG
|
||||
- Release notes: "Breaking change in envelope, pre-migration opt-in available via --legacy-envelope"
|
||||
|
||||
**Consumer impact:** Existing consumers must add `--legacy-envelope` OR update to v2.0 schema. Grace period = "until Phase 3."
|
||||
|
||||
**Timeline estimate:** Immediately after Phase 1 ships.
|
||||
|
||||
### Phase 3 — Flat-Shape Deprecation
|
||||
|
||||
**Deliverables:**
|
||||
- `--legacy-envelope` flag prints deprecation warning to stderr
|
||||
- SCHEMAS.md "Legacy v1.0" section marked DEPRECATED
|
||||
- v3.0 release (future): remove flag entirely, binary only emits v2.0
|
||||
|
||||
**Consumer impact:** Full migration required by v3.0.
|
||||
|
||||
**Timeline estimate:** Phase 3 after ~6 months of Phase 2 usage.
|
||||
|
||||
---
|
||||
|
||||
## 4. Implementation Details
|
||||
|
||||
### 4a. Shared Wrapper Helper
|
||||
|
||||
```rust
|
||||
// rust/crates/rusty-claude-cli/src/json_envelope.rs (new file)
|
||||
|
||||
pub fn wrap_v2_success<T: Serialize>(command: &str, data: T) -> Value {
|
||||
serde_json::json!({
|
||||
"timestamp": chrono::Utc::now().to_rfc3339_opts(chrono::SecondsFormat::Secs, true),
|
||||
"command": command,
|
||||
"exit_code": 0,
|
||||
"output_format": "json",
|
||||
"schema_version": "2.0",
|
||||
"data": data,
|
||||
})
|
||||
}
|
||||
|
||||
pub fn wrap_v2_error(command: &str, error: StructuredError) -> Value {
|
||||
serde_json::json!({
|
||||
"timestamp": chrono::Utc::now().to_rfc3339_opts(chrono::SecondsFormat::Secs, true),
|
||||
"command": command,
|
||||
"exit_code": 1,
|
||||
"output_format": "json",
|
||||
"schema_version": "2.0",
|
||||
"error": {
|
||||
"kind": error.kind,
|
||||
"operation": error.operation,
|
||||
"target": error.target,
|
||||
"retryable": error.retryable,
|
||||
"message": error.message,
|
||||
"hint": error.hint,
|
||||
},
|
||||
})
|
||||
}
|
||||
|
||||
pub struct StructuredError {
|
||||
pub kind: &'static str, // enum from §4.44
|
||||
pub operation: String,
|
||||
pub target: String,
|
||||
pub retryable: bool,
|
||||
pub message: String,
|
||||
pub hint: Option<String>,
|
||||
}
|
||||
```
|
||||
|
||||
### 4b. Per-Verb Migration Pattern
|
||||
|
||||
```rust
|
||||
// Before (current flat shape):
|
||||
match output_format {
|
||||
CliOutputFormat::Json => {
|
||||
serde_json::to_string_pretty(&DoctorOutput {
|
||||
kind: "doctor",
|
||||
checks,
|
||||
summary,
|
||||
has_failures,
|
||||
message,
|
||||
report,
|
||||
})
|
||||
}
|
||||
CliOutputFormat::Text => render_text(&data),
|
||||
}
|
||||
|
||||
// After (v2.0 with v1.0 fallback):
|
||||
match (output_format, envelope_version) {
|
||||
(CliOutputFormat::Json, 2) => {
|
||||
json_envelope::wrap_v2_success("doctor", DoctorData { checks, summary, has_failures })
|
||||
}
|
||||
(CliOutputFormat::Json, 1) => {
|
||||
// Legacy flat shape (with deprecation warning at Phase 3)
|
||||
serde_json::to_value(&LegacyDoctorOutput { kind: "doctor", ...})
|
||||
}
|
||||
(CliOutputFormat::Text, _) => render_text(&data),
|
||||
}
|
||||
```
|
||||
|
||||
### 4c. Error Classification Migration
|
||||
|
||||
Current error `kind` values (found in binary):
|
||||
- `cli_parse`, `no_managed_sessions`, `unknown`, `missing_credentials`, `session_not_found`
|
||||
|
||||
Target v2.0 enum (per §4.44):
|
||||
- `filesystem`, `auth`, `session`, `parse`, `runtime`, `mcp`, `delivery`, `usage`, `policy`, `unknown`
|
||||
|
||||
**Migration table:**
|
||||
| Current kind | v2.0 error.kind |
|
||||
|---|---|
|
||||
| `cli_parse` | `parse` |
|
||||
| `no_managed_sessions` | `session` (with operation: "list_sessions") |
|
||||
| `missing_credentials` | `auth` |
|
||||
| `session_not_found` | `session` (with operation: "resolve_session") |
|
||||
| `unknown` | `unknown` |
|
||||
|
||||
---
|
||||
|
||||
## 5. Acceptance Criteria
|
||||
|
||||
1. **Schema parity:** Every `--output-format json` command emits v2.0 envelope shape exactly per SCHEMAS.md
|
||||
2. **Success/error symmetry:** Success envelopes have `data` field; error envelopes have `error` object; never both
|
||||
3. **kind semantic unification:** `data.kind` = verb identity (when present); `error.kind` = enum from §4.44. No overloading.
|
||||
4. **Common metadata:** `timestamp`, `command`, `exit_code`, `output_format`, `schema_version` present in ALL envelopes
|
||||
5. **Dual-mode support:** `--envelope-version=1|2` flag allows opt-in/opt-out during migration
|
||||
6. **Tests:** Per-verb golden test fixtures for both v1.0 and v2.0 envelopes
|
||||
7. **Documentation:** SCHEMAS.md documents both versions with deprecation timeline
|
||||
|
||||
---
|
||||
|
||||
## 6. Risks
|
||||
|
||||
### 6a. Breaking Change Risk
|
||||
|
||||
Phase 2 (default version bump) WILL break consumers that depend on flat-shape envelope. Mitigations:
|
||||
- Dual-mode flag allows opt-in testing before default change
|
||||
- Long grace period (Phase 3 deprecation ~6 months post-Phase 2)
|
||||
- Clear migration guide + example consumer code
|
||||
|
||||
### 6b. Implementation Risk
|
||||
|
||||
14 verbs to migrate. Each verb has its own success shape (`checks`, `agents`, `phases`, etc.). Payload structure stays the same; only the wrapper changes. Mechanical but high-volume.
|
||||
|
||||
**Estimated diff size:** ~200 lines per verb × 14 verbs = ~2,800 lines (mostly boilerplate).
|
||||
|
||||
**Mitigation:** Start with doctor, status, version as pilot. If pattern works, batch remaining 11.
|
||||
|
||||
### 6c. Error Classification Remapping Risk
|
||||
|
||||
Changing `kind: "cli_parse"` to `error.kind: "parse"` is a breaking change even within the error envelope. Consumers doing `response["kind"] == "cli_parse"` will break.
|
||||
|
||||
**Mitigation:** Document explicitly in migration guide. Provide sed script if needed.
|
||||
|
||||
---
|
||||
|
||||
## 7. Deliverables Summary
|
||||
|
||||
| Item | Phase | Effort |
|
||||
|---|---|---|
|
||||
| `json_envelope.rs` shared helper | Phase 1 | 1 day |
|
||||
| 14 verb migrations (pilot 3 + batch 11) | Phase 1 | 2 days |
|
||||
| `--envelope-version` flag | Phase 1 | 0.5 day |
|
||||
| Dual-mode tests (golden fixtures) | Phase 1 | 1 day |
|
||||
| SCHEMAS.md updates (v1.0 + v2.0) | Phase 1 | 0.5 day |
|
||||
| Default version bump | Phase 2 | 0.5 day |
|
||||
| Deprecation warnings | Phase 3 | 0.5 day |
|
||||
| Migration guide doc | Phase 1 | 0.5 day |
|
||||
|
||||
**Total estimate:** ~6 developer-days for Phase 1 (the core work). Phases 2/3 are cheap follow-ups.
|
||||
|
||||
---
|
||||
|
||||
## 8. Rollout Timeline (Proposed)
|
||||
|
||||
- **Week 1:** Phase 1 — dual-mode support + pilot migration (3 verbs)
|
||||
- **Week 2:** Phase 1 completion — remaining 11 verbs + full test coverage
|
||||
- **Week 3:** Stabilization period, gather consumer feedback
|
||||
- **Month 2:** Phase 2 — default version bump
|
||||
- **Month 8:** Phase 3 — deprecation warnings
|
||||
- **v3.0 release:** Remove `--legacy-envelope` flag, v1.0 shape no longer supported
|
||||
|
||||
---
|
||||
|
||||
## 9. Related
|
||||
|
||||
- **ROADMAP #164:** The originating pinpoint (this document is its fix-locus)
|
||||
- **ROADMAP §4.44:** Typed-error contract (defines the error.kind enum this migration uses)
|
||||
- **SCHEMAS.md:** The envelope schema this migration makes reality
|
||||
- **Typed-error family:** #102, #121, #127, #129, #130, #245, **#164**
|
||||
|
||||
---
|
||||
|
||||
**Cycle #77 locus doc. Ready for author review + pilot implementation decision.**
|
||||
208
MERGE_CHECKLIST.md
Normal file
208
MERGE_CHECKLIST.md
Normal file
@@ -0,0 +1,208 @@
|
||||
# Merge Checklist — claw-code
|
||||
|
||||
**Purpose:** Streamline merging of the 17 review-ready branches by grouping them into safe clusters and providing per-cluster merge order + validation steps.
|
||||
|
||||
**Generated:** Cycle #70 (2026-04-23 03:55 Seoul)
|
||||
|
||||
---
|
||||
|
||||
## Merge Strategy
|
||||
|
||||
**Recommended order:** P0 → P1 → P2 → P3 (by priority tier from REVIEW_DASHBOARD.md).
|
||||
|
||||
**Batch strategy:** Merge by cluster, not individual branches. Each cluster shares the same fix pattern, so reviewers can validate one cluster and merge all members together.
|
||||
|
||||
**Estimated throughput:** 2-3 clusters per merge session. At current cycle velocity (~1 cluster per 15 min), full queue → merged main in ~2 hours.
|
||||
|
||||
---
|
||||
|
||||
## Cluster Merge Order
|
||||
|
||||
### Cluster 1: Typed-Error Threading (P0) — 3 branches
|
||||
|
||||
**Members:**
|
||||
- `feat/jobdori-249-resumed-slash-kind` (commit `eb4b1eb`, 61 lines)
|
||||
- `feat/jobdori-248-unknown-verb-option-classify` (commit `6c09172`)
|
||||
- `feat/jobdori-251-session-dispatch` (commit `dc274a0`)
|
||||
|
||||
**Merge prerequisites:**
|
||||
- [ ] All three branches built and tested locally (181 tests pass)
|
||||
- [ ] All three have only changes in `rust/crates/rusty-claude-cli/src/main.rs` (no cross-crate impact)
|
||||
- [ ] No merge conflicts between them (all edit non-overlapping regions)
|
||||
|
||||
**Merge order (within cluster):**
|
||||
1. #249 (smallest, lowest risk)
|
||||
2. #248 (medium)
|
||||
3. #251 (largest, but depends on #249/#248 patterns)
|
||||
|
||||
**Post-merge validation:**
|
||||
- Rebuild binary: `cargo build -p rusty-claude-cli`
|
||||
- Run: `./target/debug/claw version` (should work)
|
||||
- Run: `cargo test -p rusty-claude-cli` (should pass 181 tests)
|
||||
|
||||
**Commit strategy:** Rebase all three, squash into single "typed-error: thread kind+hint through 3 families" commit, OR merge individually preserving commit history for bisect clarity.
|
||||
|
||||
---
|
||||
|
||||
### Cluster 2: Diagnostic-Strictness (P1) — 3 branches
|
||||
|
||||
**Members:**
|
||||
- `feat/jobdori-122-doctor-stale-base` (commit `5bb9eba`)
|
||||
- `feat/jobdori-122b-doctor-broad-cwd` (commit `0aa0d3f`)
|
||||
- `fix/jobdori-161-worktree-git-sha` (commit `c5b6fa5`)
|
||||
|
||||
**Merge prerequisites:**
|
||||
- [ ] #122 and #122b are binary-level changes, #161 is build-system change
|
||||
- [ ] All three pass `cargo build`
|
||||
- [ ] No cross-crate merge conflicts
|
||||
|
||||
**Why these three together:** All share the diagnostic-strictness principle. #122 and #122b extend `doctor`, #161 fixes `version`. Merging as a cluster signals the principle to future reviewers.
|
||||
|
||||
**Post-merge validation:**
|
||||
- Rebuild binary
|
||||
- Run: `claw doctor` (should now check stale-base + broad-cwd)
|
||||
- Run: `claw version` (should report correct SHA even in worktrees)
|
||||
- Run: `cargo test` (full suite)
|
||||
|
||||
**Commit strategy:** Merge individually preserving history, then add ROADMAP commit explaining the cluster principle. This makes the doctrine visible in git log.
|
||||
|
||||
---
|
||||
|
||||
### Cluster 3: Help-Parity (P1) — 4 branches
|
||||
|
||||
**Members:**
|
||||
- `feat/jobdori-130b-filesystem-context` (commit `d49a75c`)
|
||||
- `feat/jobdori-130c-diff-help` (commit `83f744a`)
|
||||
- `feat/jobdori-130d-config-help` (commit `19638a0`)
|
||||
- `feat/jobdori-130e-dispatch-help` + `feat/jobdori-130e-surface-help` (commits `0ca0344`, `9dd7e79`)
|
||||
|
||||
**Merge prerequisites:**
|
||||
- [ ] All four branches edit help-topic routing in the same regions
|
||||
- [ ] Verify no merge conflicts (should be sequential, non-overlapping edits)
|
||||
- [ ] `cargo build` passes
|
||||
|
||||
**Why these four together:** All address help-parity (verbs in `--help` → correct help topics). This cluster is the most "batch-like" — identical fix pattern repeated.
|
||||
|
||||
**Post-merge validation:**
|
||||
- Rebuild binary
|
||||
- Run: `claw diff --help` (should route to help topic, not crash)
|
||||
- Run: `claw config --help` (ditto)
|
||||
- Run: `claw --help` (should list all verbs)
|
||||
|
||||
**Merge strategy:** Can be fast-forwarded or squashed as a unit since they're all the same pattern.
|
||||
|
||||
---
|
||||
|
||||
### Cluster 4: Suffix-Guard (P2) — 2 branches
|
||||
|
||||
**Members:**
|
||||
- `feat/jobdori-152-init-suffix-guard` (commit `860f285`)
|
||||
- `feat/jobdori-152-bootstrap-plan-suffix-guard` (commit `3a533ce`)
|
||||
|
||||
**Merge prerequisites:**
|
||||
- [ ] Both branches add `rest.len() > 1` check to no-arg verbs
|
||||
- [ ] No conflicts
|
||||
|
||||
**Post-merge validation:**
|
||||
- `claw init extra-arg` (should reject)
|
||||
- `claw bootstrap-plan extra-arg` (should reject)
|
||||
|
||||
**Merge strategy:** Merge together.
|
||||
|
||||
---
|
||||
|
||||
### Cluster 5: Verb-Classification (P2) — 1 branch
|
||||
|
||||
**Member:**
|
||||
- `feat/jobdori-160-verb-classification` (commit `5538934`)
|
||||
|
||||
**Merge prerequisites:**
|
||||
- [ ] Binary tested (23-line change to parser)
|
||||
- [ ] `cargo test` passes 181 tests
|
||||
|
||||
**Post-merge validation:**
|
||||
- `claw resume bogus-id` (should emit slash-command guidance, not missing_credentials)
|
||||
- `claw explain this` (should still route to Prompt)
|
||||
|
||||
**Note:** Can merge solo or batch with #4. No dependencies.
|
||||
|
||||
---
|
||||
|
||||
### Cluster 6: Doc-Truthfulness (P3) — 2 branches
|
||||
|
||||
**Members:**
|
||||
- `docs/parity-update-2026-04-23` (commit `92a79b5`)
|
||||
- `docs/jobdori-162-usage-verb-parity` (commit `48da190`)
|
||||
|
||||
**Merge prerequisites:**
|
||||
- [ ] Both are doc-only (no code risk)
|
||||
- [ ] USAGE.md sections match verbs in `--help`
|
||||
- [ ] PARITY.md stats are current
|
||||
|
||||
**Post-merge validation:**
|
||||
- `claw --help` (all verbs listed)
|
||||
- `grep "dump-manifests\|bootstrap-plan" USAGE.md` (should find sections)
|
||||
- Read PARITY.md (should cite current date + stats)
|
||||
|
||||
**Merge strategy:** Can merge in any order.
|
||||
|
||||
---
|
||||
|
||||
## Merge Conflict Risk Assessment
|
||||
|
||||
**High-risk clusters (potential conflicts):**
|
||||
- Cluster 1 (Typed-error) — all edit `main.rs` dispatch/error arms, but in different methods (likely non-overlapping)
|
||||
- Cluster 3 (Help-parity) — all edit help-routing, but different verbs (should sequence cleanly)
|
||||
|
||||
**Low-risk clusters (isolated changes):**
|
||||
- Cluster 2 (Diagnostic-strictness) — #122 and #122b both edit `check_workspace_health()`, could conflict. #161 edits `build.rs` (no overlap).
|
||||
- Cluster 4 (Suffix-guard) — two independent verbs, no conflict
|
||||
- Cluster 5 (Verb-classification) — solo, no conflict
|
||||
- Cluster 6 (Doc-truthfulness) — doc-only, no conflict
|
||||
|
||||
**Conflict mitigation:** Merge Cluster 2 sub-groups: (#122 → #122b → #161) to avoid simultaneous edits to `check_workspace_health()`.
|
||||
|
||||
---
|
||||
|
||||
## Post-Merge Validation Checklist
|
||||
|
||||
**After all clusters are merged to main:**
|
||||
|
||||
- [ ] `cargo build --all` (full workspace build)
|
||||
- [ ] `cargo test -p rusty-claude-cli` (181 tests pass)
|
||||
- [ ] `cargo fmt --all --check` (no formatting regressions)
|
||||
- [ ] `./target/debug/claw version` (correct SHA, not stale)
|
||||
- [ ] `./target/debug/claw doctor` (stale-base + broad-cwd warnings work)
|
||||
- [ ] `./target/debug/claw --help` (all verbs listed)
|
||||
- [ ] `grep -c "### \`" USAGE.md` (all 12 verbs documented, not 8)
|
||||
- [ ] Fresh dogfood run: `./target/debug/claw prompt "test"` (works)
|
||||
|
||||
---
|
||||
|
||||
## Timeline Estimate
|
||||
|
||||
| Phase | Time | Action |
|
||||
|---|---|---|
|
||||
| Merge Cluster 1 (P0 typed-error) | ~15 min | Merge 3 branches, test, validate |
|
||||
| Merge Cluster 2 (P1 diagnostic-strictness) | ~15 min | Merge 3 branches (mind #122/#122b conflict) |
|
||||
| Merge Cluster 3 (P1 help-parity) | ~20 min | Merge 4 branches (batch-friendly) |
|
||||
| Merge Cluster 4–6 (P2–P3, low-risk) | ~10 min | Fast merges |
|
||||
| **Total** | **~60 min** | **All 17 branches → main** |
|
||||
|
||||
---
|
||||
|
||||
## Notes for Reviewer
|
||||
|
||||
**Branch-last protocol validation:** All 17 branches here represent work that was:
|
||||
1. Pinpoint filed (with repro + fix shape)
|
||||
2. Implemented in scratch/worktree (not directly on main)
|
||||
3. Verified to build + pass tests
|
||||
4. Only then branched for review
|
||||
|
||||
This artifact provides the final step: **validated merge order + per-cluster risks.**
|
||||
|
||||
**Integration-support artifact:** This checklist reduces reviewer cognitive load by pre-answering "which merge order is safest?" and "what could go wrong?" questions.
|
||||
|
||||
---
|
||||
|
||||
**Checklist source:** Cycle #70 (2026-04-23 03:55 Seoul)
|
||||
151
OPT_OUT_AUDIT.md
Normal file
151
OPT_OUT_AUDIT.md
Normal file
@@ -0,0 +1,151 @@
|
||||
# OPT_OUT Surface Audit Roadmap
|
||||
|
||||
**Status:** Pre-audit (decision table ready, survey pending)
|
||||
|
||||
This document governs the audit and potential promotion of 12 OPT_OUT surfaces (commands that currently do **not** support `--output-format json`).
|
||||
|
||||
## OPT_OUT Classification Rationale
|
||||
|
||||
A surface is classified as OPT_OUT when:
|
||||
1. **Human-first by nature:** Rich Markdown prose / diagrams / structured text where JSON would be information loss
|
||||
2. **Query-filtered alternative exists:** Commands with internal `--query` / `--limit` don't need JSON (users already have escape hatch)
|
||||
3. **Simulation/debug only:** Not meant for production orchestration (e.g., mode simulators)
|
||||
4. **Future JSON work is planned:** Documented in ROADMAP with clear upgrade path
|
||||
|
||||
---
|
||||
|
||||
## OPT_OUT Surfaces (12 Total)
|
||||
|
||||
### Group A: Rich-Markdown Reports (4 commands)
|
||||
|
||||
**Rationale:** These emit structured narrative prose. JSON would require lossy serialization.
|
||||
|
||||
| Command | Output | Current use | JSON case |
|
||||
|---|---|---|---|
|
||||
| `summary` | Multi-section workspace summary (Markdown) | Human readability | Not applicable; Markdown is the output |
|
||||
| `manifest` | Workspace manifest with project tree (Markdown) | Human readability | Not applicable; Markdown is the output |
|
||||
| `parity-audit` | TypeScript/Python port comparison report (Markdown) | Human readability | Not applicable; Markdown is the output |
|
||||
| `setup-report` | Preflight + startup diagnostics (Markdown) | Human readability | Not applicable; Markdown is the output |
|
||||
|
||||
**Audit decision:** These likely remain OPT_OUT long-term (Markdown-as-output is intentional). If JSON version needed in future, would be a separate `--output-format json` path generating structured data (project summary object, manifest array, audit deltas, setup checklist) — but that's a **new contract**, not an addition to existing Markdown surfaces.
|
||||
|
||||
**Pinpoint:** #175 (deferred) — audit whether `summary`/`manifest` should emit JSON structured versions *in parallel* with Markdown, or if Markdown-only is the right UX.
|
||||
|
||||
---
|
||||
|
||||
### Group B: List Commands with Query Filters (3 commands)
|
||||
|
||||
**Rationale:** These already support `--query` and `--limit` for filtering. JSON output would be redundant; users can pipe to `jq`.
|
||||
|
||||
| Command | Filtering | Current output | JSON case |
|
||||
|---|---|---|---|
|
||||
| `subsystems` | `--limit` | Human-readable list | Use `--query` to filter, users can parse if needed |
|
||||
| `commands` | `--query`, `--limit`, `--no-plugin-commands`, `--no-skill-commands` | Human-readable list | Use `--query` to filter, users can parse if needed |
|
||||
| `tools` | `--query`, `--limit`, `--simple-mode` | Human-readable list | Use `--query` to filter, users can parse if needed |
|
||||
|
||||
**Audit decision:** `--query` / `--limit` are already the machine-friendly escape hatch. These commands are **intentionally** list-filter-based (not orchestration-primary). Promoting to CLAWABLE would require:
|
||||
1. Formalizing what the structured output *is* (command array? tool array?)
|
||||
2. Versioning the schema per command
|
||||
3. Updating tests to validate per-command schemas
|
||||
|
||||
**Cost-benefit:** Low. Users who need structured data can already use `--query` to narrow results, then parse. Effort to promote > value.
|
||||
|
||||
**Pinpoint:** #176 (backlog) — audit `--query` UX; consider if a `--query-json` escape hatch (output JSON of matching items) is worth the schema tax.
|
||||
|
||||
---
|
||||
|
||||
### Group C: Simulation / Debug Surfaces (5 commands)
|
||||
|
||||
**Rationale:** These are intentionally **not production-orchestrated**. They simulate behavior, test modes, or debug scenarios. JSON output doesn't add value.
|
||||
|
||||
| Command | Purpose | Output | Use case |
|
||||
|---|---|---|---|
|
||||
| `remote-mode` | Simulate remote execution | Text (mock session) | Testing harness behavior under remote constraints |
|
||||
| `ssh-mode` | Simulate SSH execution | Text (mock SSH session) | Testing harness behavior over SSH-like transport |
|
||||
| `teleport-mode` | Simulate teleport hop | Text (mock hop session) | Testing harness behavior with teleport bouncing |
|
||||
| `direct-connect-mode` | Simulate direct network | Text (mock session) | Testing harness behavior with direct connectivity |
|
||||
| `deep-link-mode` | Simulate deep-link invocation | Text (mock deep-link) | Testing harness behavior from URL/deeplink |
|
||||
|
||||
**Audit decision:** These are **intentionally simulation-only**. Promoting to CLAWABLE means:
|
||||
1. "This simulated mode is now a valid orchestration surface"
|
||||
2. Need to define what JSON output *means* (mock session state? simulation log?)
|
||||
3. Need versioning + test coverage
|
||||
|
||||
**Cost-benefit:** Very low. These are debugging tools, not orchestration endpoints. Effort to promote >> value.
|
||||
|
||||
**Pinpoint:** #177 (backlog) — decide if mode simulators should ever be CLAWABLE (probably no).
|
||||
|
||||
---
|
||||
|
||||
## Audit Workflow (Future Cycles)
|
||||
|
||||
### For each surface:
|
||||
1. **Survey:** Check if any external claw actually uses --output-format with this surface
|
||||
2. **Cost estimate:** How much schema work + testing?
|
||||
3. **Value estimate:** How much demand for JSON version?
|
||||
4. **Decision:** CLAWABLE, remain OPT_OUT, or new pinpoint?
|
||||
|
||||
### Promotion criteria (if promoting to CLAWABLE):
|
||||
|
||||
A surface moves from OPT_OUT → CLAWABLE **only if**:
|
||||
- ✅ Clear use case for JSON (not just "hypothetically could be JSON")
|
||||
- ✅ Schema is simple and stable (not 20+ fields)
|
||||
- ✅ At least one external claw has requested it
|
||||
- ✅ Tests can be added without major refactor
|
||||
- ✅ Maintainability burden is worth the value
|
||||
|
||||
### Demote criteria (if staying OPT_OUT):
|
||||
|
||||
A surface stays OPT_OUT **if**:
|
||||
- ✅ JSON would be information loss (Markdown reports)
|
||||
- ✅ Equivalent filtering already exists (`--query` / `--limit`)
|
||||
- ✅ Use case is simulation/debug, not production
|
||||
- ✅ Promotion effort > value to users
|
||||
|
||||
---
|
||||
|
||||
## Post-Audit Outcomes
|
||||
|
||||
### Likely scenario (high confidence)
|
||||
|
||||
**Group A (Markdown reports):** Remain OPT_OUT
|
||||
- `summary`, `manifest`, `parity-audit`, `setup-report` are **intentionally** human-first
|
||||
- If JSON-like structure is needed in future, would be separate `*-json` commands or distinct `--output-format`, not added to Markdown surfaces
|
||||
|
||||
**Group B (List filters):** Remain OPT_OUT
|
||||
- `subsystems`, `commands`, `tools` have `--query` / `--limit` as query layer
|
||||
- Users who need structured data already have escape hatch
|
||||
|
||||
**Group C (Mode simulators):** Remain OPT_OUT
|
||||
- `remote-mode`, `ssh-mode`, etc. are debug tools, not orchestration endpoints
|
||||
- No demand for JSON version; promotion would be forced, not driven
|
||||
|
||||
**Result:** OPT_OUT audit concludes that 12/12 surfaces should **remain OPT_OUT** (no promotions).
|
||||
|
||||
### If demand emerges
|
||||
|
||||
If external claws report needing JSON from any OPT_OUT surface:
|
||||
1. File pinpoint with use case + rationale
|
||||
2. Estimate cost + value
|
||||
3. If value > cost, promote to CLAWABLE with full test coverage
|
||||
4. Update SCHEMAS.md
|
||||
5. Update CLAUDE.md
|
||||
|
||||
---
|
||||
|
||||
## Timeline
|
||||
|
||||
- **Post-#174 (now):** OPT_OUT audit documented (this file)
|
||||
- **Cycles #19–#21 (deferred):** Survey period — collect data on external demand
|
||||
- **Cycle #22 (deferred):** Final audit decision + any promotions
|
||||
- **Post-audit:** Move to protocol maintenance mode (new commands/fields/surfaces)
|
||||
|
||||
---
|
||||
|
||||
## Related
|
||||
|
||||
- **OPT_OUT_DEMAND_LOG.md** — Active survey recording real demand signals (evidentiary base for any promotion decision)
|
||||
- **SCHEMAS.md** — Clawable surface contracts
|
||||
- **CLAUDE.md** — Development guidance
|
||||
- **test_cli_parity_audit.py** — Parametrized tests for CLAWABLE_SURFACES enforcement
|
||||
- **ROADMAP.md** — Macro phases (this audit is Phase 3 before Phase 2 closure)
|
||||
167
OPT_OUT_DEMAND_LOG.md
Normal file
167
OPT_OUT_DEMAND_LOG.md
Normal file
@@ -0,0 +1,167 @@
|
||||
# OPT_OUT Demand Log
|
||||
|
||||
**Purpose:** Record real demand signals for promoting OPT_OUT surfaces to CLAWABLE. Without this log, the audit criteria in `OPT_OUT_AUDIT.md` have no evidentiary base.
|
||||
|
||||
**Status:** Active survey window (post-#178/#179, cycles #21+)
|
||||
|
||||
## How to file a demand signal
|
||||
|
||||
When any external claw, operator, or downstream consumer actually needs JSON output from one of the 12 OPT_OUT surfaces, add an entry below. **Speculation, "could be useful someday," and internal hypotheticals do NOT count.**
|
||||
|
||||
A valid signal requires:
|
||||
- **Source:** Who/what asked (human, automation, agent session, external tool)
|
||||
- **Surface:** Which OPT_OUT command (from the 12)
|
||||
- **Use case:** The concrete orchestration problem they're trying to solve
|
||||
- **Would-parse-Markdown alternative checked?** Why the existing OPT_OUT output is insufficient
|
||||
- **Date:** When the signal was received
|
||||
|
||||
## Promotion thresholds
|
||||
|
||||
Per `OPT_OUT_AUDIT.md` criteria:
|
||||
- **2+ independent signals** for the same surface within a survey window → file promotion pinpoint
|
||||
- **1 signal + existing stable schema** → file pinpoint for discussion
|
||||
- **0 signals** → surface stays OPT_OUT (documented rationale in audit file)
|
||||
|
||||
The threshold is intentionally high. Single-use hacks can be served via one-off Markdown parsing; schema promotion is expensive (docs, tests, maintenance).
|
||||
|
||||
---
|
||||
|
||||
## Demand Signals Received
|
||||
|
||||
### Group A: Rich-Markdown Reports
|
||||
|
||||
#### `summary`
|
||||
**Signals received: 0**
|
||||
|
||||
Notes: No demand recorded. Markdown output is intentional and useful for human review.
|
||||
|
||||
#### `manifest`
|
||||
**Signals received: 0**
|
||||
|
||||
Notes: No demand recorded.
|
||||
|
||||
#### `parity-audit`
|
||||
**Signals received: 0**
|
||||
|
||||
Notes: No demand recorded. Report consumers are humans reviewing porting progress, not automation.
|
||||
|
||||
#### `setup-report`
|
||||
**Signals received: 0**
|
||||
|
||||
Notes: No demand recorded.
|
||||
|
||||
---
|
||||
|
||||
### Group B: List Commands with Query Filters
|
||||
|
||||
#### `subsystems`
|
||||
**Signals received: 0**
|
||||
|
||||
Notes: `--limit` already provides filtering. No claws requesting JSON.
|
||||
|
||||
#### `commands`
|
||||
**Signals received: 0**
|
||||
|
||||
Notes: `--query`, `--limit`, `--no-plugin-commands`, `--no-skill-commands` already allow filtering. No demand recorded.
|
||||
|
||||
#### `tools`
|
||||
**Signals received: 0**
|
||||
|
||||
Notes: `--query`, `--limit`, `--simple-mode` provide filtering. No demand recorded.
|
||||
|
||||
---
|
||||
|
||||
### Group C: Simulation / Debug Surfaces
|
||||
|
||||
#### `remote-mode`
|
||||
**Signals received: 0**
|
||||
|
||||
Notes: Simulation-only. No production orchestration need.
|
||||
|
||||
#### `ssh-mode`
|
||||
**Signals received: 0**
|
||||
|
||||
Notes: Simulation-only.
|
||||
|
||||
#### `teleport-mode`
|
||||
**Signals received: 0**
|
||||
|
||||
Notes: Simulation-only.
|
||||
|
||||
#### `direct-connect-mode`
|
||||
**Signals received: 0**
|
||||
|
||||
Notes: Simulation-only.
|
||||
|
||||
#### `deep-link-mode`
|
||||
**Signals received: 0**
|
||||
|
||||
Notes: Simulation-only.
|
||||
|
||||
---
|
||||
|
||||
## Survey Window Status
|
||||
|
||||
| Cycle | Date | New Signals | Running Total | Action |
|
||||
|---|---|---|---|---|
|
||||
| #21 | 2026-04-22 | 0 | 0 | Survey opened; log established |
|
||||
|
||||
**Current assessment:** Zero demand for any OPT_OUT surface promotion. This is consistent with `OPT_OUT_AUDIT.md` prediction that all 12 likely stay OPT_OUT long-term.
|
||||
|
||||
---
|
||||
|
||||
## Signal Entry Template
|
||||
|
||||
```
|
||||
### <surface-name>
|
||||
**Signal received: [N]**
|
||||
|
||||
Entry N (YYYY-MM-DD):
|
||||
- Source: <who/what>
|
||||
- Use case: <concrete orchestration problem>
|
||||
- Markdown-alternative-checked: <yes/no + why insufficient>
|
||||
- Follow-up: <filed pinpoint / discussion thread / closed>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Decision Framework
|
||||
|
||||
At cycle #22 (or whenever survey window closes):
|
||||
|
||||
### If 0 signals total (likely):
|
||||
- Move all 12 surfaces to `PERMANENTLY_OPT_OUT` or similar
|
||||
- Remove `OPT_OUT_SURFACES` from `test_cli_parity_audit.py` (everything is explicitly non-goal)
|
||||
- Update `CLAUDE.md` to reflect maintainership mode
|
||||
- Close `OPT_OUT_AUDIT.md` with "audit complete, no promotions"
|
||||
|
||||
### If 1–2 signals on isolated surfaces:
|
||||
- File individual promotion pinpoints per surface with demand evidence
|
||||
- Each goes through standard #171/#172/#173 loop (parity audit, SCHEMAS.md, consistency test)
|
||||
|
||||
### If high demand (3+ signals):
|
||||
- Reopen audit: is the OPT_OUT classification actually correct?
|
||||
- Review whether protocol expansion is warranted
|
||||
|
||||
---
|
||||
|
||||
## Related Files
|
||||
|
||||
- **`OPT_OUT_AUDIT.md`** — Audit criteria, decision table, rationale by group
|
||||
- **`SCHEMAS.md`** — JSON contract for the 14 CLAWABLE surfaces
|
||||
- **`tests/test_cli_parity_audit.py`** — Machine enforcement of CLAWABLE/OPT_OUT classification
|
||||
- **`CLAUDE.md`** — Development posture (maintainership mode)
|
||||
|
||||
---
|
||||
|
||||
## Philosophy
|
||||
|
||||
**Prevent speculative expansion.** The discipline of requiring real signals before promotion protects the protocol from schema bloat. Every new CLAWABLE surface adds:
|
||||
- A SCHEMAS.md section (maintenance burden)
|
||||
- Test coverage (test suite tax)
|
||||
- Documentation (cognitive load for new developers)
|
||||
- Version compatibility (schema_version bump risk)
|
||||
|
||||
If a claw can't articulate *why* it needs JSON for `summary` beyond "it would be nice," then JSON for `summary` is not needed. The Markdown output is a feature, not a gap.
|
||||
|
||||
The audit log closes the loop on "governed non-goals": OPT_OUT surfaces are intentionally not clawable until proven otherwise by evidence.
|
||||
@@ -1,13 +1,14 @@
|
||||
# Parity Status — claw-code Rust Port
|
||||
|
||||
Last updated: 2026-04-03
|
||||
Last updated: 2026-04-23
|
||||
|
||||
## Summary
|
||||
|
||||
- Canonical document: this top-level `PARITY.md` is the file consumed by `rust/scripts/run_mock_parity_diff.py`.
|
||||
- Requested 9-lane checkpoint: **All 9 lanes merged on `main`.**
|
||||
- Current `main` HEAD: `ee31e00` (stub implementations replaced with real AskUserQuestion + RemoteTrigger).
|
||||
- Repository stats at this checkpoint: **292 commits on `main` / 293 across all branches**, **9 crates**, **48,599 tracked Rust LOC**, **2,568 test LOC**, **3 authors**, date range **2026-03-31 → 2026-04-03**.
|
||||
- Current `main` HEAD: `ad1cf92` (doctrine loop canonical example).
|
||||
- Repository stats at this checkpoint: **979 commits on `main`**, **9 crates**, **80,789 tracked Rust LOC**, **4,533 test LOC**, **3 authors**, date **2026-04-23**.
|
||||
- **Growth since last PARITY update (2026-04-03):** Rust LOC +66% (48,599 → 80,789), Test LOC +76% (2,568 → 4,533), Commits +235% (292 → 979). Current phase: 13 branches awaiting review/integration.
|
||||
- Mock parity harness stats: **10 scripted scenarios**, **19 captured `/v1/messages` requests** in `rust/crates/rusty-claude-cli/tests/mock_parity_harness.rs`.
|
||||
|
||||
## Mock parity harness — milestone 1
|
||||
|
||||
192
PHASE_1_KICKOFF.md
Normal file
192
PHASE_1_KICKOFF.md
Normal file
@@ -0,0 +1,192 @@
|
||||
# Phase 1 Kickoff — Classifier Sweeps + Doc-Truth + Design Decisions
|
||||
|
||||
**Status:** Ready for execution once Phase 0 (`feat/jobdori-168c-emission-routing`) merges.
|
||||
|
||||
**Date prepared:** 2026-04-23 11:47 Seoul (cycles #104–#108 complete, all unaudited surfaces probed)
|
||||
|
||||
---
|
||||
|
||||
## What Got Done (Phase 0)
|
||||
|
||||
- ✅ JSON output shape routing (no-silent test, SCHEMAS baseline, parity guard)
|
||||
- ✅ 7 dogfood filings (#155, #169, #170, #171, #172, #153, checkpoint)
|
||||
- ✅ 9 probe cycles (plugins, agents, init, bootstrap-plan, system-prompt, export, sandbox, dump-manifests, skills)
|
||||
- ✅ 82 pinpoints filed, 67 genuinely open
|
||||
- ✅ 227/227 tests pass, 0 regressions
|
||||
- ✅ Review guide + priority queue locked
|
||||
- ✅ Doctrine: 28 principles accumulated
|
||||
|
||||
---
|
||||
|
||||
## What Phase 1 Will Do (Confirmed via Gaebal-Gajae)
|
||||
|
||||
Execute priority-ordered fixes in 6 bundles + independents:
|
||||
|
||||
### Priority 1: Error Envelope Contract Drift
|
||||
|
||||
**Bundle:** `feat/jobdori-181-error-envelope-contract-drift` (#181 + #183)
|
||||
|
||||
**What it fixes:**
|
||||
- #181: `plugins bogus-subcommand` returns success-shaped envelope (no `type: "error"`, error buried in message)
|
||||
- #183: `plugins` and `mcp` emit different shapes on unknown subcommand
|
||||
|
||||
**Why it's Priority 1:** Foundation layer. Error envelope is the root contract. All downstream fixes assume correct envelope shape.
|
||||
|
||||
**Implementation:** Align `plugins` unknown-subcommand handler to `agents` canonical reference. Ensure both emit `type: "error"` + correct `kind`.
|
||||
|
||||
**Risk profile:** HIGH (touches error routing, breaks if consumers depend on old shape) → but gated by Phase 0 freeze + comprehensive tests
|
||||
|
||||
---
|
||||
|
||||
### Priority 2: CLI Contract Hygiene Sweep
|
||||
|
||||
**Bundle:** `feat/jobdori-184-cli-contract-hygiene-sweep` (#184 + #185)
|
||||
|
||||
**What it fixes:**
|
||||
- #184: `claw init` silently accepts unknown positional arguments (should reject)
|
||||
- #185: `claw bootstrap-plan` silently accepts unknown flags (should reject)
|
||||
|
||||
**Why it's Priority 2:** Extensions. Guard clauses on existing envelope shape. Uses envelope from Priority 1.
|
||||
|
||||
**Implementation:** Add trailing-args rejection to `init` and unknown-flag rejection to `bootstrap-plan`. Pattern: match existing guard in #171 (extra-args classifier).
|
||||
|
||||
**Risk profile:** MEDIUM (adds guards, no shape changes)
|
||||
|
||||
---
|
||||
|
||||
### Priority 3: Classifier Sweep (4 Verbs)
|
||||
|
||||
**Bundle:** `feat/jobdori-186-192-classifier-sweep` (#186 + #187 + #189 + #192)
|
||||
|
||||
**What it fixes:**
|
||||
- #186: `system-prompt --<unknown>` classified as `unknown` → should be `cli_parse`
|
||||
- #187: `export --<unknown>` classified as `unknown` → should be `cli_parse`
|
||||
- #189: `dump-manifests --<unknown>` classified as `unknown` → should be `cli_parse`
|
||||
- #192: `skills install --<unknown>` classified as `unknown` → should be `cli_parse`
|
||||
|
||||
**Why it's Priority 3:** Cleanup. Classifier additions, same envelope, one unified pattern across 4 verbs.
|
||||
|
||||
**Implementation:** Add 4 classifier branches (one per verb) to the unknown-option handler. Same test pattern for all.
|
||||
|
||||
**Risk profile:** LOW (classifier-only, no routing changes)
|
||||
|
||||
---
|
||||
|
||||
### Priority 4: USAGE.md Standalone Surface Audit
|
||||
|
||||
**Bundle:** `feat/jobdori-180-usage-standalone-surface` (#180)
|
||||
|
||||
**What it fixes:**
|
||||
- #180: USAGE.md incomplete verb coverage (doc-truthfulness audit-flow)
|
||||
|
||||
**Why it's Priority 4:** Doc audit. Prerequisite for #188 (help-text gaps).
|
||||
|
||||
**Implementation:** Audit USAGE.md against all verbs (compare against `claw --help` verb list). Add missing verb documentation.
|
||||
|
||||
**Risk profile:** LOW (docs-only)
|
||||
|
||||
---
|
||||
|
||||
### Priority 5: Dump-Manifests Help-Text Fix
|
||||
|
||||
**Bundle:** `feat/jobdori-188-dump-manifests-help-prerequisite` (#188)
|
||||
|
||||
**What it fixes:**
|
||||
- #188: `dump-manifests --help` omits prerequisite (env var or flag required)
|
||||
|
||||
**Why it's Priority 5:** Doc-truth probe-flow. Comes after audit-flow (#180).
|
||||
|
||||
**Implementation:** Update help text to show required alternatives and environment variable.
|
||||
|
||||
**Risk profile:** LOW (help-text only)
|
||||
|
||||
---
|
||||
|
||||
### Priority 6+: Independent Fixes
|
||||
|
||||
- #190: Design decision (help-routing for no-args install) — needs architecture review
|
||||
- #191: `skills install` filesystem classifier gap — can bundle with #177/#178/#179 or standalone
|
||||
- #182: Plugin classifier alignment (unknown → filesystem/runtime) — depends on #181 resolution
|
||||
- #177/#178/#179: Install-surface taxonomy (possible 4-verb bundle)
|
||||
- #173: Config hint field (consumer-parity)
|
||||
- #174: Resume trailing classifier (closed? verify)
|
||||
- #175: CI fmt/test decoupling (gaebal-gajae owned)
|
||||
|
||||
---
|
||||
|
||||
## Concrete Next Steps (Once Phase 0 Merges)
|
||||
|
||||
1. **Create branch 1:** `feat/jobdori-181-error-envelope-contract-drift`
|
||||
- Files: error router, tests for #181 + #183
|
||||
- PR against main
|
||||
- Expected: 2 commits, 5 new tests, 0 regressions
|
||||
|
||||
2. **Create branch 2:** `feat/jobdori-184-cli-contract-hygiene-sweep`
|
||||
- Files: init guard, bootstrap-plan guard
|
||||
- PR against main
|
||||
- Expected: 2 commits, 3 new tests
|
||||
|
||||
3. **Create branch 3:** `feat/jobdori-186-192-classifier-sweep`
|
||||
- Files: unknown-option handler (4 verbs)
|
||||
- PR against main
|
||||
- Expected: 1 commit, 4 new tests
|
||||
|
||||
4. **Create branch 4:** `feat/jobdori-180-usage-standalone-surface`
|
||||
- Files: USAGE.md additions
|
||||
- PR against main
|
||||
- Expected: 1 commit, 0 tests
|
||||
|
||||
5. **Create branch 5:** `feat/jobdori-188-dump-manifests-help-prerequisite`
|
||||
- Files: help text update (string change)
|
||||
- PR against main
|
||||
- Expected: 1 commit, 0 tests
|
||||
|
||||
6. **Triage independents:** #190 requires architecture discussion; others can follow once above merges.
|
||||
|
||||
---
|
||||
|
||||
## Hypothesis Validation (Codified for Future Probes)
|
||||
|
||||
**Multi-flag verbs (install, enable, init, bootstrap-plan, system-prompt, export, dump-manifests):** 3–4 classifier gaps each.
|
||||
|
||||
**Single-issue verbs (list, show, sandbox, agents):** 0–1 gaps.
|
||||
|
||||
**Future probe strategy:** Prioritize multi-flag verbs; single-issue verbs are mostly clean.
|
||||
|
||||
---
|
||||
|
||||
## Doctrine Points Relevant to Phase 1 Execution
|
||||
|
||||
- **Doctrine #22:** Schema baseline check before enum proposal
|
||||
- **Doctrine #25:** Contract-surface-first ordering (foundation → extensions → cleanup)
|
||||
- **Doctrine #27:** Same-pattern pinpoints should bundle into one classifier sweep PR
|
||||
- **Doctrine #28:** First observation is hypothesis, not filing (verify before classifying)
|
||||
|
||||
---
|
||||
|
||||
## Known Blockers & Risks
|
||||
|
||||
1. **Phase 0 merge gating:** Can't create Phase 1 branches until Phase 0 lands (28 base + 37 new = 65 total pending)
|
||||
2. **#190 design decision:** help-routing behavior needs architectural consensus (intentional vs inconsistency)
|
||||
3. **Cross-family dependencies:** #182 depends on #181 (plugin error envelope must be correct first)
|
||||
|
||||
---
|
||||
|
||||
## Testing Strategy for Phase 1
|
||||
|
||||
- **Priority 1–3 bundles:** Existing test framework (`output_format_contract.rs`, classifier tests). Comprehensive coverage per bundle.
|
||||
- **Priority 4–5 bundles:** Light doc verification (grep USAGE.md, spot-check help text).
|
||||
- **Independent fixes:** Case-by-case once prioritized.
|
||||
|
||||
---
|
||||
|
||||
## Success Criteria
|
||||
|
||||
- ✅ All Priority 1–5 bundles merge to main
|
||||
- ✅ 0 regressions (227+ tests pass across all merges)
|
||||
- ✅ CI green on all PRs
|
||||
- ✅ Reviewer sign-offs on all bundles
|
||||
|
||||
---
|
||||
|
||||
**Phase 1 is ready to execute. Awaiting Phase 0 merge approval.**
|
||||
29
README.md
29
README.md
@@ -1,18 +1,12 @@
|
||||
# Claw Code
|
||||
|
||||
<p align="center">
|
||||
<strong>188K GitHub stars and climbing.</strong>
|
||||
</p>
|
||||
|
||||
<p align="center">
|
||||
<strong>Rust-native agent execution for people who want speed, control, and a real terminal.</strong>
|
||||
</p>
|
||||
|
||||
<p align="center">
|
||||
<a href="https://github.com/ultraworkers/claw-code">ultraworkers/claw-code</a>
|
||||
·
|
||||
<a href="./USAGE.md">Usage</a>
|
||||
·
|
||||
<a href="./ERROR_HANDLING.md">Error Handling</a>
|
||||
·
|
||||
<a href="./rust/README.md">Rust workspace</a>
|
||||
·
|
||||
<a href="./PARITY.md">Parity</a>
|
||||
@@ -36,21 +30,8 @@
|
||||
<img src="assets/claw-hero.jpeg" alt="Claw Code" width="300" />
|
||||
</p>
|
||||
|
||||
<p align="center">
|
||||
Claw Code just crossed <strong>188,000 GitHub stars</strong>. This repo is the public Rust implementation of the <code>claw</code> CLI agent harness, built in the open with the UltraWorkers community.
|
||||
</p>
|
||||
|
||||
<p align="center">
|
||||
The canonical implementation lives in <a href="./rust/">rust/</a>, and the current source of truth for this repository is <strong>ultraworkers/claw-code</strong>.
|
||||
</p>
|
||||
|
||||
## 188K and climbing
|
||||
|
||||
Thanks to everyone who starred, tested, reviewed, and pushed the project forward. Claw Code is focused on a straightforward promise: a fast local-first CLI agent runtime with native tools, inspectable behavior, and a Rust workspace that stays close to the metal.
|
||||
|
||||
- Native Rust workspace and CLI binary under [`rust/`](./rust)
|
||||
- Local-first workflows for prompts, sessions, tooling, and parity validation
|
||||
- Open development across the broader UltraWorkers ecosystem
|
||||
Claw Code is the public Rust implementation of the `claw` CLI agent harness.
|
||||
The canonical implementation lives in [`rust/`](./rust), and the current source of truth for this repository is **ultraworkers/claw-code**.
|
||||
|
||||
> [!IMPORTANT]
|
||||
> Start with [`USAGE.md`](./USAGE.md) for build, auth, CLI, session, and parity-harness workflows. Make `claw doctor` your first health check after building, use [`rust/README.md`](./rust/README.md) for crate-level details, read [`PARITY.md`](./PARITY.md) for the current Rust-port checkpoint, and see [`docs/container.md`](./docs/container.md) for the container-first workflow.
|
||||
@@ -61,9 +42,11 @@ Thanks to everyone who starred, tested, reviewed, and pushed the project forward
|
||||
|
||||
- **`rust/`** — canonical Rust workspace and the `claw` CLI binary
|
||||
- **`USAGE.md`** — task-oriented usage guide for the current product surface
|
||||
- **`ERROR_HANDLING.md`** — unified error-handling pattern for orchestration code
|
||||
- **`PARITY.md`** — Rust-port parity status and migration notes
|
||||
- **`ROADMAP.md`** — active roadmap and cleanup backlog
|
||||
- **`PHILOSOPHY.md`** — project intent and system-design framing
|
||||
- **`SCHEMAS.md`** — JSON protocol contract (Python harness reference)
|
||||
- **`src/` + `tests/`** — companion Python/reference workspace and audit helpers; not the primary runtime surface
|
||||
|
||||
## Quick start
|
||||
|
||||
191
REVIEW_DASHBOARD.md
Normal file
191
REVIEW_DASHBOARD.md
Normal file
@@ -0,0 +1,191 @@
|
||||
# Review Dashboard — claw-code
|
||||
|
||||
**Last updated:** 2026-04-23 03:34 Seoul
|
||||
**Queue state:** 14 review-ready branches
|
||||
**Main HEAD:** `f18f45c` (ROADMAP #161 filed)
|
||||
|
||||
This is an integration support artifact (per cycle #64 doctrine). Its purpose: let reviewers see all queued branches, cluster membership, and merge priorities without re-deriving from git log.
|
||||
|
||||
---
|
||||
|
||||
## At-A-Glance
|
||||
|
||||
| Priority | Cluster | Branches | Complexity | Status |
|
||||
|---|---|---|---|---|
|
||||
| P0 | Typed-error threading | #248, #249, #251 | S–M | Merge-ready |
|
||||
| P1 | Diagnostic-strictness | #122, #122b | S | Merge-ready |
|
||||
| P1 | Help-parity | #130b-#130e | S each | Merge-ready (batch) |
|
||||
| P2 | Suffix-guard | #152-init, #152-bootstrap-plan | XS each | Merge-ready (batch) |
|
||||
| P2 | Verb-classification | #160 | S | Merge-ready (just shipped) |
|
||||
| P3 | Doc truthfulness | docs/parity-update | XS | Merge-ready |
|
||||
|
||||
**Suggested merge order:** P0 → P1 → P2 → P3. Within P0, start with #249 (smallest diff).
|
||||
|
||||
---
|
||||
|
||||
## Detailed Branch Inventory
|
||||
|
||||
### P0: Typed-Error Threading (3 branches)
|
||||
|
||||
#### `feat/jobdori-249-resumed-slash-kind` — **SMALLEST. START HERE.**
|
||||
- **Commit:** `eb4b1eb`
|
||||
- **Diff:** 61 lines in `rust/crates/rusty-claude-cli/src/main.rs`
|
||||
- **Scope:** Two Err arms in `resume_session()` at lines 2745, 2782 now emit `kind` + `hint`
|
||||
- **Cluster:** Completes #247 parent's typed-error family
|
||||
- **Tests:** 181 binary tests pass (no regressions)
|
||||
- **Reviewer checklist:** see `/tmp/pr-summary-249.md`
|
||||
- **Expected merge time:** ~5 minutes
|
||||
|
||||
#### `feat/jobdori-248-unknown-verb-option-classify`
|
||||
- **Commit:** `6c09172`
|
||||
- **Scope:** Unknown verb + option classifier family
|
||||
- **Cluster:** #247 parent's typed-error family (sibling of #249)
|
||||
|
||||
#### `feat/jobdori-251-session-dispatch`
|
||||
- **Commit:** `dc274a0`
|
||||
- **Scope:** Intercepts session-management verbs (`list-sessions`, `load-session`, `delete-session`, `flush-transcript`) at top-level parser
|
||||
- **Cluster:** #247 parent's typed-error family
|
||||
- **Note:** Larger change than #248/#249 — prefer merging those first
|
||||
|
||||
### P1: Diagnostic-Strictness (2 branches)
|
||||
|
||||
#### `feat/jobdori-122-doctor-stale-base`
|
||||
- **Commit:** `5bb9eba`
|
||||
- **Scope:** `claw doctor` now warns on stale-base (same check as prompt preflight)
|
||||
- **Cluster:** Diagnostic surfaces reflect runtime reality (cycle #57 principle)
|
||||
|
||||
#### `feat/jobdori-122b-doctor-broad-cwd`
|
||||
- **Commit:** `0aa0d3f`
|
||||
- **Scope:** `claw doctor` now warns when cwd is broad path (home/root)
|
||||
- **Cluster:** Same as #122 (direct sibling)
|
||||
- **Batch suggestion:** Review together with #122
|
||||
|
||||
### P1: Help-Parity (4 branches, batch-reviewable)
|
||||
|
||||
All four implement uniform `--help` flag handling. Related by fix locus (help-topic routing).
|
||||
|
||||
#### `feat/jobdori-130b-filesystem-context`
|
||||
- **Commit:** `d49a75c`
|
||||
- **Scope:** Filesystem I/O errors enriched with operation + path context
|
||||
|
||||
#### `feat/jobdori-130c-diff-help`
|
||||
- **Commit:** `83f744a`
|
||||
- **Scope:** `claw diff --help` routes to help topic
|
||||
|
||||
#### `feat/jobdori-130d-config-help`
|
||||
- **Commit:** `19638a0`
|
||||
- **Scope:** `claw config --help` routes to help topic
|
||||
|
||||
#### `feat/jobdori-130e-dispatch-help` + `feat/jobdori-130e-surface-help`
|
||||
- **Commits:** `0ca0344`, `9dd7e79`
|
||||
- **Scope:** Category A (dispatch-order) + Category B (surface) help-anomaly fixes from systematic sweep
|
||||
- **Batch suggestion:** Review #130c, #130d, #130e-dispatch, #130e-surface as one unit — all use same pattern (add help flag guard before action)
|
||||
|
||||
### P2: Suffix-Guard (2 branches, batch-reviewable)
|
||||
|
||||
#### `feat/jobdori-152-init-suffix-guard`
|
||||
- **Commit:** `860f285`
|
||||
- **Scope:** `claw init` rejects trailing args
|
||||
- **Cluster:** Uniform no-arg verb suffix guards
|
||||
|
||||
#### `feat/jobdori-152-bootstrap-plan-suffix-guard`
|
||||
- **Commit:** `3a533ce`
|
||||
- **Scope:** `claw bootstrap-plan` rejects trailing args
|
||||
- **Cluster:** Same as above (direct sibling)
|
||||
- **Batch suggestion:** Review together
|
||||
|
||||
### P2: Verb-Classification (1 branch, just shipped cycle #63)
|
||||
|
||||
#### `feat/jobdori-160-verb-classification`
|
||||
- **Commit:** `5538934`
|
||||
- **Scope:** Reserved-semantic verbs (resume, compact, memory, commit, pr, issue, bughunter) with positional args now emit slash-command guidance
|
||||
- **Cluster:** Sibling of #251 (dispatch leak family), applied to promptable/reserved split
|
||||
- **Design closure note:** Investigation in cycle #61 revealed verb-classification was the actual need; cycle #63 implemented the class table
|
||||
|
||||
### P3: Doc Truthfulness (1 branch, just shipped cycle #64)
|
||||
|
||||
#### `docs/parity-update-2026-04-23`
|
||||
- **Commit:** `92a79b5`
|
||||
- **Scope:** PARITY.md stats refreshed (Rust LOC +66%, Test LOC +76%, Commits +235% since 2026-04-03)
|
||||
- **Risk:** Near-zero (4-line diff, doc-only)
|
||||
- **Merge time:** ~1 minute
|
||||
|
||||
---
|
||||
|
||||
## Batch Review Patterns
|
||||
|
||||
For reviewer efficiency, these groups share the same fix-locus or pattern:
|
||||
|
||||
| Batch | Branches | Shared pattern |
|
||||
|---|---|---|
|
||||
| Help-parity bundle | #130c, #130d, #130e-dispatch, #130e-surface | All add help-flag guard before action in dispatch |
|
||||
| Suffix-guard bundle | #152-init, #152-bootstrap-plan | Both add `rest.len() > 1` check to no-arg verbs |
|
||||
| Diagnostic-strictness bundle | #122, #122b | Both extend `check_workspace_health()` with new preflights |
|
||||
| Typed-error bundle | #248, #249, #251 | All thread `classify_error_kind` + `split_error_hint` into specific Err arms |
|
||||
|
||||
If reviewer has limited time, batch review saves context switches.
|
||||
|
||||
---
|
||||
|
||||
## Review Friction Map
|
||||
|
||||
**Lowest friction (safe start):**
|
||||
- docs/parity-update (4 lines, doc-only)
|
||||
- #249 (61 lines, 2 Err arms, 181 tests pass)
|
||||
- #160 (23 lines, new helper + pre-check)
|
||||
|
||||
**Medium friction:**
|
||||
- #122, #122b (each ~100 lines, diagnostic extensions)
|
||||
- #248 (classifier family)
|
||||
- #152-* branches (XS each)
|
||||
|
||||
**Highest friction:**
|
||||
- #251 (broader parser changes, multi-verb coverage)
|
||||
- #130e bundle (help-parity systematic sweep)
|
||||
|
||||
---
|
||||
|
||||
## Open Pinpoints Awaiting Implementation
|
||||
|
||||
| # | Title | Priority | Est. diff | Notes |
|
||||
|---|---|---|---|---|
|
||||
| #157 | Auth remediation registry | S-M | 50-80 lines | Cycle #59 audit pre-fill |
|
||||
| #158 | Hook validation at worker boot | S | 30-50 lines | Cycle #59 audit pre-fill |
|
||||
| #159 | Plugin manifest validation at worker boot | S | 30-50 lines | Cycle #59 audit pre-fill |
|
||||
| #161 | Stale Git SHA in worktree builds | S | ~15 lines in build.rs | Cycle #65 just filed |
|
||||
|
||||
None of these should be implemented while current queue is 14. Prioritize merging queue first.
|
||||
|
||||
---
|
||||
|
||||
## Merge Throughput Notes
|
||||
|
||||
**Target throughput:** 2-3 branches per review session. At current cycle velocity (cycles #39–#65 = 27 cycles in ~3 hours), 2-3 merges unblock:
|
||||
- 3+ cluster closures (typed-error, diagnostic-strictness, help-parity)
|
||||
- 1 doctrine loop closure (verb-classification → #160)
|
||||
- 1 doc freshness (PARITY.md)
|
||||
|
||||
**Post-merge expected state:** ~10 branches remaining, queue shifts from saturated (14) to manageable (10), velocity cycles can resume in safe zone.
|
||||
|
||||
---
|
||||
|
||||
## For The Reviewer
|
||||
|
||||
**Reviewing checklist (per-branch):**
|
||||
- [ ] Diff matches pinpoint description
|
||||
- [ ] Tests pass (cite count: should be 181+ for branches that touched main.rs)
|
||||
- [ ] Backward compatibility verified (check-list in commit message)
|
||||
- [ ] No related cluster branches yet to land (check cluster column above)
|
||||
|
||||
**Reviewer shortcut for #249** (recommended first-merge):
|
||||
```bash
|
||||
cd /tmp/jobdori-249
|
||||
git log --oneline -1 # eb4b1eb
|
||||
git diff main..HEAD -- rust/crates/rusty-claude-cli/src/main.rs | head -50
|
||||
```
|
||||
|
||||
Or skip straight to: `/tmp/pr-summary-249.md` (pre-prepared PR-ready artifact).
|
||||
|
||||
---
|
||||
|
||||
**Dashboard source:** Cycle #66 (2026-04-23 03:34 Seoul). Updates should be re-run when branches merge or new pinpoints land.
|
||||
9160
ROADMAP.md
9160
ROADMAP.md
File diff suppressed because one or more lines are too long
708
SCHEMAS.md
Normal file
708
SCHEMAS.md
Normal file
@@ -0,0 +1,708 @@
|
||||
# JSON Envelope Schemas — Clawable CLI Contract
|
||||
|
||||
> **⚠️ CRITICAL: This document describes the TARGET v2.0 envelope schema, not the current v1.0 binary behavior.** The Rust binary currently emits a **flat v1.0 envelope** that does NOT include `timestamp`, `command`, `exit_code`, `output_format`, or `schema_version` fields. See [`FIX_LOCUS_164.md`](./FIX_LOCUS_164.md) for the full migration plan and timeline. **Do not build automation against the field shapes below without first testing against the actual binary output.** Use `claw <command> --output-format json` to inspect what your binary version actually emits.
|
||||
|
||||
This document locks the **target** field-level contract for all clawable-surface commands. After the v1.0→v2.0 migration (FIX_LOCUS_164 Phase 2), every command accepting `--output-format json` will conform to the envelope shapes documented here.
|
||||
|
||||
**Target audience:** Claws planning v2.0 migration, reference implementers, contract validators.
|
||||
|
||||
**Current v1.0 reality:** See [`ERROR_HANDLING.md`](./ERROR_HANDLING.md) Appendix A for the flat envelope shape the binary actually emits today.
|
||||
|
||||
---
|
||||
|
||||
## Common Fields (All Envelopes) — TARGET v2.0 SCHEMA
|
||||
|
||||
**This section describes the v2.0 target schema. The current v1.0 binary does NOT emit these fields.** See FIX_LOCUS_164.md for the migration timeline.
|
||||
|
||||
After v2.0 migration, every command response, success or error, will carry:
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-04-22T10:10:00Z",
|
||||
"command": "list-sessions",
|
||||
"exit_code": 0,
|
||||
"output_format": "json",
|
||||
"schema_version": "2.0"
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Type | Required | Notes |
|
||||
|---|---|---|---|
|
||||
| `timestamp` | ISO 8601 UTC | Yes | Time command completed |
|
||||
| `command` | string | Yes | argv[1] (e.g. "list-sessions") |
|
||||
| `exit_code` | int (0/1/2) | Yes | 0=success, 1=error/not-found, 2=timeout |
|
||||
| `output_format` | string | Yes | Always "json" (for symmetry with text mode) |
|
||||
| `schema_version` | string | Yes | "1.0" (bump for breaking changes) |
|
||||
|
||||
---
|
||||
|
||||
## Turn Result Fields (Multi-Turn Sessions)
|
||||
|
||||
When a command's response includes a `turn` object (e.g., in `bootstrap` or `turn-loop`), it carries:
|
||||
|
||||
| Field | Type | Required | Notes |
|
||||
|---|---|---|---|
|
||||
| `prompt` | string | Yes | User input for this turn |
|
||||
| `output` | string | Yes | Assistant response |
|
||||
| `stop_reason` | enum | Yes | One of: `completed`, `timeout`, `cancelled`, `max_budget_reached`, `max_turns_reached` |
|
||||
| `cancel_observed` | bool | Yes | #164 Stage B: cancellation was signaled and observed (#161/#164) |
|
||||
|
||||
---
|
||||
|
||||
## Error Envelope
|
||||
|
||||
When a command fails (exit code 1), responses carry:
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-04-22T10:10:00Z",
|
||||
"command": "exec-command",
|
||||
"exit_code": 1,
|
||||
"error": {
|
||||
"kind": "filesystem",
|
||||
"operation": "write",
|
||||
"target": "/tmp/nonexistent/out.md",
|
||||
"retryable": true,
|
||||
"message": "No such file or directory",
|
||||
"hint": "intermediate directory does not exist; try mkdir -p /tmp/nonexistent"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Type | Required | Notes |
|
||||
|---|---|---|---|
|
||||
| `error.kind` | enum | Yes | One of: `filesystem`, `auth`, `session`, `parse`, `runtime`, `mcp`, `delivery`, `usage`, `policy`, `unknown` |
|
||||
| `error.operation` | string | Yes | Syscall/method that failed (e.g. "write", "open", "resolve_session") |
|
||||
| `error.target` | string | Yes | Resource that failed (path, session-id, server-name, etc.) |
|
||||
| `error.retryable` | bool | Yes | Whether caller can safely retry without intervention |
|
||||
| `error.message` | string | Yes | Platform error message (e.g. errno text) |
|
||||
| `error.hint` | string | No | Optional actionable next step |
|
||||
|
||||
---
|
||||
|
||||
## Not-Found Envelope
|
||||
|
||||
When an entity does not exist (exit code 1, but not a failure):
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-04-22T10:10:00Z",
|
||||
"command": "load-session",
|
||||
"exit_code": 1,
|
||||
"name": "does-not-exist",
|
||||
"found": false,
|
||||
"error": {
|
||||
"kind": "session_not_found",
|
||||
"message": "session 'does-not-exist' not found in .claw/sessions/",
|
||||
"retryable": false
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Type | Required | Notes |
|
||||
|---|---|---|---|
|
||||
| `name` | string | Yes | Entity name/id that was looked up |
|
||||
| `found` | bool | Yes | Always `false` for not-found |
|
||||
| `error.kind` | enum | Yes | One of: `command_not_found`, `tool_not_found`, `session_not_found` |
|
||||
| `error.message` | string | Yes | User-visible explanation |
|
||||
| `error.retryable` | bool | Yes | Usually `false` (entity will not magically appear) |
|
||||
|
||||
---
|
||||
|
||||
## Per-Command Success Schemas
|
||||
|
||||
### `list-sessions`
|
||||
|
||||
**Status**: ✅ Implemented (closed #251 cycle #45, 2026-04-23).
|
||||
|
||||
**Actual binary envelope** (as of #251 fix):
|
||||
```json
|
||||
{
|
||||
"command": "list-sessions",
|
||||
"sessions": [
|
||||
{
|
||||
"id": "session-1775777421902-1",
|
||||
"path": "/path/to/.claw/sessions/session-1775777421902-1.jsonl",
|
||||
"updated_at_ms": 1775777421902,
|
||||
"message_count": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Aspirational (future) shape**:
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-04-22T10:10:00Z",
|
||||
"command": "list-sessions",
|
||||
"exit_code": 0,
|
||||
"output_format": "json",
|
||||
"schema_version": "1.0",
|
||||
"directory": ".claw/sessions",
|
||||
"sessions_count": 2,
|
||||
"sessions": [
|
||||
{
|
||||
"session_id": "sess_abc123",
|
||||
"created_at": "2026-04-21T15:30:00Z",
|
||||
"last_modified": "2026-04-22T09:45:00Z",
|
||||
"prompt_count": 5,
|
||||
"stopped": false
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Gap**: Current impl lacks `timestamp`, `exit_code`, `output_format`, `schema_version`, `directory`, `sessions_count` (derivable), and the session object uses `id`/`updated_at_ms`/`message_count` instead of `session_id`/`last_modified`/`prompt_count`. Follow-up #250 Option B to align field names and add common-envelope fields.
|
||||
|
||||
### `delete-session`
|
||||
|
||||
**Status**: ⚠️ Stub only (closed #251 dispatch-order fix; full impl deferred).
|
||||
|
||||
**Actual binary envelope** (as of #251 fix):
|
||||
```json
|
||||
{
|
||||
"type": "error",
|
||||
"command": "delete-session",
|
||||
"error": "not_yet_implemented",
|
||||
"kind": "not_yet_implemented"
|
||||
}
|
||||
```
|
||||
|
||||
Exit code: 1. No credentials required. The stub ensures the verb does NOT fall through to Prompt/auth (the #251 fix), but the actual delete operation is not yet wired.
|
||||
|
||||
**Aspirational (future) shape**:
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-04-22T10:10:00Z",
|
||||
"command": "delete-session",
|
||||
"exit_code": 0,
|
||||
"session_id": "sess_abc123",
|
||||
"deleted": true,
|
||||
"directory": ".claw/sessions"
|
||||
}
|
||||
```
|
||||
|
||||
### `load-session`
|
||||
|
||||
**Status**: ✅ Implemented (closed #251 cycle #45, 2026-04-23).
|
||||
|
||||
**Actual binary envelope** (as of #251 fix):
|
||||
```json
|
||||
{
|
||||
"command": "load-session",
|
||||
"session": {
|
||||
"id": "session-abc123",
|
||||
"path": "/path/to/.claw/sessions/session-abc123.jsonl",
|
||||
"messages": 5
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
For nonexistent sessions, emits a local `session_not_found` error (NOT `missing_credentials`):
|
||||
```json
|
||||
{
|
||||
"error": "session not found: nonexistent",
|
||||
"kind": "session_not_found",
|
||||
"type": "error",
|
||||
"hint": "Hint: managed sessions live in .claw/sessions/<hash>/ ..."
|
||||
}
|
||||
```
|
||||
|
||||
**Aspirational (future) shape**:
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-04-22T10:10:00Z",
|
||||
"command": "load-session",
|
||||
"exit_code": 0,
|
||||
"session_id": "sess_abc123",
|
||||
"loaded": true,
|
||||
"directory": ".claw/sessions",
|
||||
"path": ".claw/sessions/sess_abc123.jsonl"
|
||||
}
|
||||
```
|
||||
|
||||
**Gap**: Current impl uses nested `session: {...}` instead of flat fields, and omits common-envelope fields. Follow-up #250 Option B to align.
|
||||
|
||||
### `flush-transcript`
|
||||
|
||||
**Status**: ⚠️ Stub only (closed #251 dispatch-order fix; full impl deferred).
|
||||
|
||||
**Actual binary envelope** (as of #251 fix):
|
||||
```json
|
||||
{
|
||||
"type": "error",
|
||||
"command": "flush-transcript",
|
||||
"error": "not_yet_implemented",
|
||||
"kind": "not_yet_implemented"
|
||||
}
|
||||
```
|
||||
|
||||
Exit code: 1. No credentials required. Like `delete-session`, this stub resolves the #251 dispatch-order bug but the actual flush operation is not yet wired.
|
||||
|
||||
**Aspirational (future) shape**:
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-04-22T10:10:00Z",
|
||||
"command": "flush-transcript",
|
||||
"exit_code": 0,
|
||||
"session_id": "sess_abc123",
|
||||
"path": ".claw/sessions/sess_abc123.jsonl",
|
||||
"flushed": true,
|
||||
"messages_count": 12,
|
||||
"input_tokens": 4500,
|
||||
"output_tokens": 1200
|
||||
}
|
||||
```
|
||||
|
||||
### `show-command`
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-04-22T10:10:00Z",
|
||||
"command": "show-command",
|
||||
"exit_code": 0,
|
||||
"name": "add-dir",
|
||||
"found": true,
|
||||
"source_hint": "commands/add-dir/add-dir.tsx",
|
||||
"responsibility": "creates a new directory in the worktree"
|
||||
}
|
||||
```
|
||||
|
||||
### `show-tool`
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-04-22T10:10:00Z",
|
||||
"command": "show-tool",
|
||||
"exit_code": 0,
|
||||
"name": "BashTool",
|
||||
"found": true,
|
||||
"source_hint": "tools/BashTool/BashTool.tsx"
|
||||
}
|
||||
```
|
||||
|
||||
### `exec-command`
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-04-22T10:10:00Z",
|
||||
"command": "exec-command",
|
||||
"exit_code": 0,
|
||||
"name": "add-dir",
|
||||
"prompt": "create src/util/",
|
||||
"handled": true,
|
||||
"message": "created directory",
|
||||
"source_hint": "commands/add-dir/add-dir.tsx"
|
||||
}
|
||||
```
|
||||
|
||||
### `exec-tool`
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-04-22T10:10:00Z",
|
||||
"command": "exec-tool",
|
||||
"exit_code": 0,
|
||||
"name": "BashTool",
|
||||
"payload": "cargo build",
|
||||
"handled": true,
|
||||
"message": "exit code 0",
|
||||
"source_hint": "tools/BashTool/BashTool.tsx"
|
||||
}
|
||||
```
|
||||
|
||||
### `route`
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-04-22T10:10:00Z",
|
||||
"command": "route",
|
||||
"exit_code": 0,
|
||||
"prompt": "add a test",
|
||||
"limit": 10,
|
||||
"match_count": 3,
|
||||
"matches": [
|
||||
{
|
||||
"kind": "command",
|
||||
"name": "add-file",
|
||||
"score": 0.92,
|
||||
"source_hint": "commands/add-file/add-file.tsx"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### `bootstrap`
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-04-22T10:10:00Z",
|
||||
"command": "bootstrap",
|
||||
"exit_code": 0,
|
||||
"prompt": "hello",
|
||||
"setup": {
|
||||
"python_version": "3.13.12",
|
||||
"implementation": "CPython",
|
||||
"platform_name": "darwin",
|
||||
"test_command": "pytest"
|
||||
},
|
||||
"routed_matches": [
|
||||
{"kind": "command", "name": "init", "score": 0.85, "source_hint": "..."}
|
||||
],
|
||||
"turn": {
|
||||
"prompt": "hello",
|
||||
"output": "...",
|
||||
"stop_reason": "completed"
|
||||
},
|
||||
"persisted_session_path": ".claw/sessions/sess_abc.jsonl"
|
||||
}
|
||||
```
|
||||
|
||||
### `command-graph`
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-04-22T10:10:00Z",
|
||||
"command": "command-graph",
|
||||
"exit_code": 0,
|
||||
"builtins_count": 185,
|
||||
"plugin_like_count": 20,
|
||||
"skill_like_count": 2,
|
||||
"total_count": 207,
|
||||
"builtins": [
|
||||
{"name": "add-dir", "source_hint": "commands/add-dir/add-dir.tsx"}
|
||||
],
|
||||
"plugin_like": [],
|
||||
"skill_like": []
|
||||
}
|
||||
```
|
||||
|
||||
### `tool-pool`
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-04-22T10:10:00Z",
|
||||
"command": "tool-pool",
|
||||
"exit_code": 0,
|
||||
"simple_mode": false,
|
||||
"include_mcp": true,
|
||||
"tool_count": 184,
|
||||
"tools": [
|
||||
{"name": "BashTool", "source_hint": "tools/BashTool/BashTool.tsx"}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### `bootstrap-graph`
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-04-22T10:10:00Z",
|
||||
"command": "bootstrap-graph",
|
||||
"exit_code": 0,
|
||||
"stages": ["stage 1", "stage 2", "..."],
|
||||
"note": "bootstrap-graph is markdown-only in this version"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Versioning & Compatibility
|
||||
|
||||
- **schema_version = "1.0":** Current as of 2026-04-22. Covers all 13 clawable commands.
|
||||
- **Breaking changes** (e.g. renaming a field) bump schema_version to "2.0".
|
||||
- **Additive changes** (e.g. new optional field) stay at "1.0" and are backward compatible.
|
||||
- Downstream claws **must** check `schema_version` before relying on field presence.
|
||||
|
||||
---
|
||||
|
||||
## Regression Testing
|
||||
|
||||
Each command is covered by:
|
||||
1. **Fixture file** (golden JSON snapshot under `tests/fixtures/json/<command>.json`)
|
||||
2. **Parametrised test** in `test_cli_parity_audit.py::TestJsonOutputContractEndToEnd`
|
||||
3. **Field consistency test** (new, tracked as ROADMAP #172)
|
||||
|
||||
To update a fixture after a intentional schema change:
|
||||
```bash
|
||||
claw <command> --output-format json <args> > tests/fixtures/json/<command>.json
|
||||
# Review the diff, commit
|
||||
git add tests/fixtures/json/<command>.json
|
||||
```
|
||||
|
||||
To verify no regressions:
|
||||
```bash
|
||||
cargo test --release test_json_envelope_field_consistency
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Design Notes
|
||||
|
||||
**Why common fields on every response?**
|
||||
- Downstream claws can build one error handler that works for all commands
|
||||
- Timestamp + command + exit_code give context without scraping argv or timestamps from command output
|
||||
- `schema_version` signals compatibility for future upgrades
|
||||
|
||||
**Why both "found" and "error" on not-found?**
|
||||
- Exit code 1 covers both "entity missing" and "operation failed"
|
||||
- `found=false` distinguishes not-found from error without string matching
|
||||
- `error.kind` and `error.retryable` let automation decide: retry a temporary miss vs escalate a permanent refusal
|
||||
|
||||
**Why "operation" and "target" in error?**
|
||||
- Claws can aggregate failures by operation type (e.g. "how many `write` ops failed?")
|
||||
- Claws can implement per-target retry policy (e.g. "skip missing files, retry networking")
|
||||
- Pure text errors ("No such file") do not provide enough structure for pattern matching
|
||||
|
||||
**Why "handled" vs "found"?**
|
||||
- `show-command` reports `found: bool` (inventory signal: "does this exist?")
|
||||
- `exec-command` reports `handled: bool` (operational signal: "was this work performed?")
|
||||
- The names matter: a command can be found but not handled (e.g. too large for context window), or handled silently (no output message)
|
||||
|
||||
---
|
||||
|
||||
## Appendix: Current v1.0 vs. Target v2.0 Envelope Shapes
|
||||
|
||||
### ⚠️ IMPORTANT: Binary Reality vs. This Document
|
||||
|
||||
**This entire SCHEMAS.md document describes the TARGET v2.0 schema.** The actual Rust binary currently emits v1.0 (flat) envelopes.
|
||||
|
||||
**Do not assume the fields documented above are in the binary right now.** They are not.
|
||||
|
||||
### Current v1.0 Envelope (What the Rust Binary Actually Emits)
|
||||
|
||||
The Rust binary in `rust/` currently emits a **flat v1.0 envelope** without common metadata wrapper:
|
||||
|
||||
#### v1.0 Success Envelope Example
|
||||
|
||||
```json
|
||||
{
|
||||
"kind": "list-sessions",
|
||||
"sessions": [
|
||||
{"id": "abc123", "created": "2026-04-22T10:00:00Z", "turns": 5}
|
||||
],
|
||||
"type": "success"
|
||||
}
|
||||
```
|
||||
|
||||
**Key differences from v2.0 above:**
|
||||
- NO `timestamp`, `command`, `exit_code`, `output_format`, `schema_version` fields
|
||||
- `kind` field contains the verb name (or is entirely absent for success)
|
||||
- `type: "success"` flag at top level
|
||||
- Verb-specific fields (`sessions`, `turn`, etc.) at top level
|
||||
|
||||
#### v1.0 Error Envelope Example
|
||||
|
||||
```json
|
||||
{
|
||||
"error": "session 'xyz789' not found in .claw/sessions",
|
||||
"hint": "use 'list-sessions' to see available sessions",
|
||||
"kind": "session_not_found",
|
||||
"type": "error"
|
||||
}
|
||||
```
|
||||
|
||||
**Key differences from v2.0 error above:**
|
||||
- `error` field is a **STRING**, not a nested object
|
||||
- NO `error.operation`, `error.target`, `error.retryable` structured fields
|
||||
- `kind` is at top-level, not nested
|
||||
- NO `timestamp`, `command`, `exit_code`, `output_format`, `schema_version`
|
||||
- Extra `type: "error"` flag
|
||||
|
||||
### Migration Timeline (FIX_LOCUS_164)
|
||||
|
||||
See [`FIX_LOCUS_164.md`](./FIX_LOCUS_164.md) for the full phased migration:
|
||||
|
||||
- **Phase 1 (Opt-in):** `claw <cmd> --output-format json --envelope-version=2.0` emits v2.0 shape
|
||||
- **Phase 2 (Default):** v2.0 becomes default; `--legacy-envelope` flag opts into v1.0
|
||||
- **Phase 3 (Deprecation):** v1.0 warnings, then removal
|
||||
|
||||
### Building Automation Against v1.0 (Current)
|
||||
|
||||
**For claws building automation today** (against the real binary, not this schema):
|
||||
|
||||
1. **Check `type` field first** (string: "success" or "error")
|
||||
2. **For success:** verb-specific fields are at top level. Use `jq .kind` for verb ID (if present)
|
||||
3. **For error:** access `error` (string), `hint` (string), `kind` (string) all at top level
|
||||
4. **Do not expect:** `timestamp`, `command`, `exit_code`, `output_format`, `schema_version` — they don't exist yet
|
||||
5. **Test your code** against `claw <cmd> --output-format json` output to verify assumptions before deploying
|
||||
|
||||
### Example: Python Consumer Code (v1.0)
|
||||
|
||||
**Correct pattern for v1.0 (current binary):**
|
||||
|
||||
```python
|
||||
import json
|
||||
import subprocess
|
||||
|
||||
result = subprocess.run(
|
||||
["claw", "list-sessions", "--output-format", "json"],
|
||||
capture_output=True,
|
||||
text=True
|
||||
)
|
||||
envelope = json.loads(result.stdout)
|
||||
|
||||
# v1.0: type is at top level
|
||||
if envelope.get("type") == "error":
|
||||
error_msg = envelope.get("error", "unknown error") # error is a STRING
|
||||
error_kind = envelope.get("kind") # kind is at TOP LEVEL
|
||||
print(f"Error: {error_kind} — {error_msg}")
|
||||
else:
|
||||
# Success path: verb-specific fields at top level
|
||||
sessions = envelope.get("sessions", [])
|
||||
for session in sessions:
|
||||
print(f"Session: {session['id']}")
|
||||
```
|
||||
|
||||
**After v2.0 migration, this code will break.** Claws building for v2.0 compatibility should:
|
||||
|
||||
1. Check `schema_version` field
|
||||
2. Parse differently based on version
|
||||
3. Or wait until Phase 2 default bump is announced, then migrate
|
||||
|
||||
### Why This Mismatch Exists
|
||||
|
||||
SCHEMAS.md was written as the **target design** for v2.0. The Rust binary is still on v1.0. The migration (FIX_LOCUS_164) will bring the binary in line with this schema, but it hasn't happened yet.
|
||||
|
||||
**This mismatch is the root cause of doc-truthfulness issues #78, #79, #165.** All three docs were documenting the v2.0 target as if it were current reality.
|
||||
|
||||
### Questions?
|
||||
|
||||
- **"Is v2.0 implemented?"** No. The binary is v1.0. See FIX_LOCUS_164.md for the implementation roadmap.
|
||||
- **"Should I build against v2.0 schema?"** No. Build against v1.0 (current). Test your code with `claw` to verify.
|
||||
- **"When does v2.0 ship?"** See FIX_LOCUS_164.md Phase 1 estimate: ~6 dev-days. Not scheduled yet.
|
||||
- **"Can I use v2.0 now?"** Only if you explicitly pass `--envelope-version=2.0` (which doesn't exist yet in v1.0 binary).
|
||||
|
||||
---
|
||||
|
||||
## v1.5 Emission Baseline — Per-Verb Shape Catalog (Cycle #91, Phase 0 Task 3)
|
||||
|
||||
**Status:** 📸 Snapshot of actual binary behavior as of cycle #91 (2026-04-23). Anchored by controlled matrix `/tmp/cycle87-audit/matrix.json` + Phase 0 tests in `output_format_contract.rs`.
|
||||
|
||||
### Purpose
|
||||
|
||||
This section documents **what each verb actually emits under `--output-format json`** as of the v1.5 emission baseline (post-cycle #89 emission routing fix, pre-Phase 1 shape normalization).
|
||||
|
||||
This is a **reference artifact**, not a target schema. It describes the reality that:
|
||||
|
||||
1. `--output-format json` exists and emits JSON (enforced by Phase 0 Task 2)
|
||||
2. All output goes to stdout (enforced by #168c fix, cycle #89)
|
||||
3. Each verb has a bespoke top-level shape (documented below; to be normalized in Phase 1)
|
||||
|
||||
### Emission Contract (v1.5 Baseline)
|
||||
|
||||
| Property | Rule | Enforced By |
|
||||
|---|---|---|
|
||||
| Exit 0 + stdout empty (silent success) | **Forbidden** | Test: `emission_contract_no_silent_success_under_output_format_json_168c_task2` |
|
||||
| Exit 0 + stdout contains valid JSON | Required | Test: same (parses each safe-success verb) |
|
||||
| Exit != 0 + JSON envelope on stdout | Required | Test: same + `error_envelope_emitted_to_stdout_under_output_format_json_168c` |
|
||||
| Error envelope on stderr under `--output-format json` | **Forbidden** | Test: #168c regression test |
|
||||
| Text mode routes errors to stderr | Preserved | Backward compat; not changed by cycle #89 |
|
||||
|
||||
### Per-Verb Shape Catalog
|
||||
|
||||
Captured from controlled matrix (cycle #87) and verified against post-#168c binary (cycle #91).
|
||||
|
||||
#### Verbs with `kind` top-level field (12/13)
|
||||
|
||||
| Verb | Top-level keys | Notes |
|
||||
|---|---|---|
|
||||
| `help` | `kind, message` | Minimal shape |
|
||||
| `version` | `git_sha, kind, message, target, version` | Build metadata |
|
||||
| `doctor` | `checks, has_failures, kind, message, report, summary` | Diagnostic results |
|
||||
| `mcp` | `action, config_load_error, configured_servers, kind, servers, status, working_directory` | MCP state |
|
||||
| `skills` | `action, kind, skills, summary` | Skills inventory |
|
||||
| `agents` | `action, agents, count, kind, summary, working_directory` | Agent inventory |
|
||||
| `sandbox` | `active, active_namespace, active_network, allowed_mounts, enabled, fallback_reason, filesystem_active, filesystem_mode, in_container, kind, markers, requested_namespace, requested_network, supported` | Sandbox state (14 keys) |
|
||||
| `status` | `config_load_error, kind, model, model_raw, model_source, permission_mode, sandbox, status, usage, workspace` | Runtime status |
|
||||
| `system-prompt` | `kind, message, sections` | Prompt sections |
|
||||
| `bootstrap-plan` | `kind, phases` | Bootstrap phases |
|
||||
| `export` | `file, kind, message, messages, session_id` | Export metadata |
|
||||
| `acp` | `aliases, discoverability_tracking, kind, launch_command, message, recommended_workflows, serve_alias_only, status, supported, tracking` | ACP discoverability |
|
||||
|
||||
#### Verb with `command` top-level field (1/13) — Phase 1 normalization target
|
||||
|
||||
| Verb | Top-level keys | Notes |
|
||||
|---|---|---|
|
||||
| `list-sessions` | `command, sessions` | **Deviation:** uses `command` instead of `kind`. Target Phase 1 fix. |
|
||||
|
||||
#### Verbs with error-only emission in test env (exit != 0)
|
||||
|
||||
These verbs require external state (credentials, session fixtures, manifests) and return error envelopes in clean test environments:
|
||||
|
||||
| Verb | Error envelope keys | Notes |
|
||||
|---|---|---|
|
||||
| `bootstrap` | `error, hint, kind, type` | Requires `ANTHROPIC_AUTH_TOKEN` for success path |
|
||||
| `dump-manifests` | `error, hint, kind, type` | Requires upstream manifest source |
|
||||
| `state` | `error, hint, kind, type` | Requires worker state file |
|
||||
|
||||
**Common error envelope shape (all verbs):** `{error, hint, kind, type}` — this is the one consistently-shaped part of v1.5.
|
||||
|
||||
### Standard Error Envelope (v1.5)
|
||||
|
||||
Error envelopes are the **only** part of v1.5 with a guaranteed consistent shape across all verbs:
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "error",
|
||||
"error": "short human-readable reason",
|
||||
"kind": "snake_case_machine_readable_classification",
|
||||
"hint": "optional remediation hint (may be null)"
|
||||
}
|
||||
```
|
||||
|
||||
**Classification kinds** (from `classify_error_kind` in `main.rs`):
|
||||
- `cli_parse` — argument parsing error
|
||||
- `missing_credentials` — auth token/key missing
|
||||
- `session_not_found` — load-session target missing
|
||||
- `session_load_failed` — persisted session unreadable
|
||||
- `no_managed_sessions` — no sessions exist to list
|
||||
- `missing_manifests` — upstream manifest sources absent
|
||||
- `filesystem_io_error` — file operation failure
|
||||
- `api_http_error` — upstream API returned non-2xx
|
||||
- `unknown` — classifier fallthrough
|
||||
|
||||
### How This Differs from v2.0 Target
|
||||
|
||||
| Aspect | v1.5 (this doc) | v2.0 Target (SCHEMAS.md top) |
|
||||
|---|---|---|
|
||||
| Top-level verb ID | 12 use `kind`, 1 uses `command` | Common `command` field |
|
||||
| Common metadata | None (no `timestamp`, `exit_code`, etc.) | `timestamp`, `command`, `exit_code`, `output_format`, `schema_version` |
|
||||
| Error envelope | `{error, hint, kind, type}` flat | `{error: {message, kind, operation, target, retryable}, ...}` nested |
|
||||
| Success shape | Verb-specific (13 bespoke) | Common wrapper with `data` field |
|
||||
|
||||
### Consumer Guidance (Against v1.5 Baseline)
|
||||
|
||||
**For claws consuming v1.5 today:**
|
||||
|
||||
1. **Always use `--output-format json`** — text format has no stability contract (#167)
|
||||
2. **Check `type` field first** — "error" or absent/other (treat as success)
|
||||
3. **For errors:** access `error` (string), `kind` (string), `hint` (nullable string)
|
||||
4. **For success:** use verb-specific keys per catalog above
|
||||
5. **Do NOT assume** `kind` field exists on success path — `list-sessions` uses `command` instead
|
||||
6. **Do NOT assume** metadata fields (`timestamp`, `exit_code`, etc.) — they are v2.0 target only
|
||||
7. **Check exit code** for pass/fail; don't infer from payload alone
|
||||
|
||||
### Phase 1 Normalization Targets (After This Baseline Locks)
|
||||
|
||||
Phase 1 (shape stabilization) will normalize these divergences:
|
||||
|
||||
- `list-sessions`: `command` → `kind` (align with 12/13 convention)
|
||||
- Potentially: unify where `message` field appears (9/13 have it, inconsistently populated)
|
||||
- Potentially: unify where `action` field appears (only in 3 inventory verbs: `mcp`, `skills`, `agents`)
|
||||
|
||||
Phase 1 does **not** add common metadata (`timestamp`, `exit_code`) — that's Phase 2 (v2.0 wrapper).
|
||||
|
||||
### Regenerating This Catalog
|
||||
|
||||
The catalog is derived from running the controlled matrix. Phase 0 Task 4 will add a deterministic script; for now, reproduce with:
|
||||
|
||||
```
|
||||
for verb in help version list-sessions doctor mcp skills agents sandbox status system-prompt bootstrap-plan export acp; do
|
||||
echo "=== $verb ==="
|
||||
claw $verb --output-format json | jq 'keys'
|
||||
done
|
||||
```
|
||||
|
||||
This matches what the Phase 0 Task 2 test enforces programmatically.
|
||||
|
||||
224
USAGE.md
224
USAGE.md
@@ -2,6 +2,9 @@
|
||||
|
||||
This guide covers the current Rust workspace under `rust/` and the `claw` CLI binary. If you are brand new, make the doctor health check your first run: start `claw`, then run `/doctor`.
|
||||
|
||||
> [!TIP]
|
||||
> **Building orchestration code that calls `claw` as a subprocess?** See [`ERROR_HANDLING.md`](./ERROR_HANDLING.md) for the unified error-handling pattern (one handler for all 14 clawable commands, exit codes, JSON envelope contract, and recovery strategies).
|
||||
|
||||
## Quick-start health check
|
||||
|
||||
Run this before prompts, sessions, or automation:
|
||||
@@ -33,6 +36,60 @@ cargo build --workspace
|
||||
|
||||
The CLI binary is available at `rust/target/debug/claw` after a debug build. Make the doctor check above your first post-build step.
|
||||
|
||||
### Add binary to PATH
|
||||
|
||||
To run `claw` from anywhere without typing the full path:
|
||||
|
||||
**Option 1: Symlink to a directory already in your PATH**
|
||||
|
||||
```bash
|
||||
# Find a PATH directory (usually ~/.local/bin or /usr/local/bin)
|
||||
echo $PATH
|
||||
|
||||
# Create symlink (adjust path and PATH-dir as needed)
|
||||
ln -s /Users/yeongyu/clawd/claw-code/rust/target/debug/claw ~/.local/bin/claw
|
||||
|
||||
# Verify it's in PATH
|
||||
which claw
|
||||
```
|
||||
|
||||
**Option 2: Add the binary directory to PATH directly**
|
||||
|
||||
Add this to your shell rc file (`~/.bashrc`, `~/.zshrc`, etc.):
|
||||
|
||||
```bash
|
||||
export PATH="$PATH:/Users/yeongyu/clawd/claw-code/rust/target/debug"
|
||||
```
|
||||
|
||||
Then reload:
|
||||
|
||||
```bash
|
||||
source ~/.zshrc # or ~/.bashrc
|
||||
```
|
||||
|
||||
### Verify install
|
||||
|
||||
After adding to PATH, verify the binary works:
|
||||
|
||||
```bash
|
||||
# Should print version and exit successfully
|
||||
claw version
|
||||
|
||||
# Should run health check (shows which components are initialized)
|
||||
claw doctor
|
||||
|
||||
# Should show available commands
|
||||
claw --help
|
||||
```
|
||||
|
||||
If `claw: command not found`, the PATH addition didn't take. Re-check:
|
||||
|
||||
```bash
|
||||
echo $PATH # verify your PATH directory is listed
|
||||
which claw # should show full path to binary
|
||||
ls -la ~/.local/bin/claw # if using symlink, verify it exists and points to target/debug/claw
|
||||
```
|
||||
|
||||
## Quick start
|
||||
|
||||
### First-run doctor check
|
||||
@@ -95,11 +152,69 @@ cd rust
|
||||
|
||||
### JSON output for scripting
|
||||
|
||||
All clawable commands support `--output-format json` for machine-readable output.
|
||||
|
||||
**IMPORTANT SCHEMA VERSION NOTICE:**
|
||||
|
||||
The JSON envelope is currently in **v1.0 (flat shape)** and is scheduled to migrate to **v2.0 (nested schema)** in a future release. See [`FIX_LOCUS_164.md`](./FIX_LOCUS_164.md) for the full migration plan.
|
||||
|
||||
#### Current (v1.0) envelope shape
|
||||
|
||||
**Success envelope** — verb-specific fields + `kind: "<verb-name>"`:
|
||||
```json
|
||||
{
|
||||
"kind": "doctor",
|
||||
"checks": [...],
|
||||
"summary": {...},
|
||||
"has_failures": false,
|
||||
"report": "...",
|
||||
"message": "..."
|
||||
}
|
||||
```
|
||||
|
||||
**Error envelope** — flat error fields at top level:
|
||||
```json
|
||||
{
|
||||
"error": "unrecognized argument `foo`",
|
||||
"hint": "Run `claw --help` for usage.",
|
||||
"kind": "cli_parse",
|
||||
"type": "error"
|
||||
}
|
||||
```
|
||||
|
||||
**Known issues with v1.0:**
|
||||
- Missing `exit_code`, `command`, `timestamp`, `output_format`, `schema_version` fields
|
||||
- `error` is a string, not a structured object with operation/target/retryable/message/hint
|
||||
- `kind` field is semantically overloaded (verb identity in success, error classification in error)
|
||||
- See [`SCHEMAS.md`](./SCHEMAS.md) for documented (v2.0 target) schema and [`FIX_LOCUS_164.md`](./FIX_LOCUS_164.md) for migration details
|
||||
|
||||
#### Using v1.0 envelopes in your code
|
||||
|
||||
**Success path:** Check for absence of `type: "error"`, then access verb-specific fields:
|
||||
```bash
|
||||
cd rust
|
||||
./target/debug/claw doctor --output-format json | jq '.kind, .has_failures'
|
||||
```
|
||||
|
||||
**Error path:** Check for `type == "error"`, then access `error` (string) and `kind` (error classification):
|
||||
```bash
|
||||
cd rust
|
||||
./target/debug/claw doctor invalid-arg --output-format json | jq '.error, .kind'
|
||||
```
|
||||
|
||||
**Do NOT rely on `kind` alone for dispatching** — it has different meanings in success vs. error. Always check `type == "error"` first.
|
||||
|
||||
```bash
|
||||
cd rust
|
||||
./target/debug/claw --output-format json prompt "status"
|
||||
./target/debug/claw --output-format json load-session my-session-id
|
||||
./target/debug/claw --output-format json turn-loop "analyze logs" --max-turns 1
|
||||
```
|
||||
|
||||
**Building a dispatcher or orchestration script?** See [`ERROR_HANDLING.md`](./ERROR_HANDLING.md) for the unified error-handling pattern. One code example works for all 14 clawable commands: parse the exit code, classify by `error.kind`, apply recovery strategies (retry, timeout recovery, validation, logging). Use that pattern instead of reimplementing error handling per command.
|
||||
|
||||
**Migrating to v2.0?** Check back after [`FIX_LOCUS_164`](./FIX_LOCUS_164.md) is implemented. Phase 1 will add a `--envelope-version=2.0` flag for opt-in access to the structured envelope schema. Phase 2 will make v2.0 the default. Phase 3 will deprecate v1.0.
|
||||
|
||||
### Inspect worker state
|
||||
|
||||
The `claw state` command reads `.claw/worker-state.json`, which is written by the interactive REPL or a one-shot prompt when a worker executes a task. This file contains the worker ID, session reference, model, and permission mode.
|
||||
@@ -413,6 +528,93 @@ cd rust
|
||||
./target/debug/claw system-prompt --cwd .. --date 2026-04-04
|
||||
```
|
||||
|
||||
### `dump-manifests` — Export upstream plugin/MCP manifests
|
||||
|
||||
**Purpose:** Dump built-in tool and plugin manifests to stdout as JSON, for parity comparison against the upstream Claude Code TypeScript implementation.
|
||||
|
||||
**Prerequisite:** This command requires access to upstream source files (`src/commands.ts`, `src/tools.ts`, `src/entrypoints/cli.tsx`). Set `CLAUDE_CODE_UPSTREAM` env var or pass `--manifests-dir`.
|
||||
|
||||
```bash
|
||||
# Via env var
|
||||
CLAUDE_CODE_UPSTREAM=/path/to/upstream claw dump-manifests
|
||||
|
||||
# Via flag
|
||||
claw dump-manifests --manifests-dir /path/to/upstream
|
||||
```
|
||||
|
||||
**When to use:** Parity work (comparing the Rust port's tool/plugin surface against the canonical TypeScript implementation). Not needed for normal operation.
|
||||
|
||||
**Error mode:** If upstream sources are missing, exits with `error-kind: missing_manifests` and a hint about how to provide them.
|
||||
|
||||
### `bootstrap-plan` — Show startup component graph
|
||||
|
||||
**Purpose:** Print the ordered list of startup components that are initialized when `claw` begins a session. Useful for debugging startup issues or verifying that fast-path optimizations are in place.
|
||||
|
||||
```bash
|
||||
claw bootstrap-plan
|
||||
```
|
||||
|
||||
**Sample output:**
|
||||
```
|
||||
- CliEntry
|
||||
- FastPathVersion
|
||||
- StartupProfiler
|
||||
- SystemPromptFastPath
|
||||
- ChromeMcpFastPath
|
||||
```
|
||||
|
||||
**When to use:**
|
||||
- Debugging why startup is slow (compare your plan to the expected one)
|
||||
- Verifying that fast-path components are registered
|
||||
- Understanding the load order before customizing hooks or plugins
|
||||
|
||||
**Related:** See `claw doctor` for health checks against these startup components.
|
||||
|
||||
### `acp` — Agent Context Protocol / Zed editor integration status
|
||||
|
||||
**Purpose:** Report the current state of the ACP (Agent Context Protocol) / Zed editor integration. Currently **discoverability only** — no editor daemon is available yet.
|
||||
|
||||
```bash
|
||||
claw acp
|
||||
claw acp serve # same output; `serve` is accepted but not yet launchable
|
||||
claw --acp # alias
|
||||
claw -acp # alias
|
||||
```
|
||||
|
||||
**Sample output:**
|
||||
```
|
||||
ACP / Zed
|
||||
Status discoverability only
|
||||
Launch `claw acp serve` / `claw --acp` / `claw -acp` report status only; no editor daemon is available yet
|
||||
Today use `claw prompt`, the REPL, or `claw doctor` for local verification
|
||||
Tracking ROADMAP #76
|
||||
```
|
||||
|
||||
**When to use:** Check whether ACP/Zed integration is ready in your current build. Plan around its availability (track ROADMAP #76 for status).
|
||||
|
||||
**Today's alternatives:** Use `claw prompt` for one-shot runs, the interactive REPL for iterative work, or `claw doctor` for local verification.
|
||||
|
||||
### `export` — Export session transcript
|
||||
|
||||
**Purpose:** Export a managed session's transcript to a file or stdout. Operates on the currently-resumed session (requires `--resume`).
|
||||
|
||||
```bash
|
||||
# Export latest session
|
||||
claw --resume latest export
|
||||
|
||||
# Export specific session
|
||||
claw --resume <session-id> export
|
||||
```
|
||||
|
||||
**Prerequisite:** A managed session must exist under `.claw/sessions/<workspace-fingerprint>/`. If no sessions exist, the command exits with `error-kind: no_managed_sessions` and a hint to start a session first.
|
||||
|
||||
**When to use:**
|
||||
- Archive session transcripts for review
|
||||
- Share session context with teammates
|
||||
- Feed session history into downstream tooling
|
||||
|
||||
**Related:** Inside the REPL, `/export` is also available as a slash command for the active session.
|
||||
|
||||
## Session management
|
||||
|
||||
REPL turns are persisted under `.claw/sessions/` in the current workspace.
|
||||
@@ -423,7 +625,27 @@ cd rust
|
||||
./target/debug/claw --resume latest /status /diff
|
||||
```
|
||||
|
||||
Useful interactive commands include `/help`, `/status`, `/cost`, `/config`, `/session`, `/model`, `/permissions`, and `/export`.
|
||||
### Interactive slash commands (inside the REPL)
|
||||
|
||||
Useful interactive commands include:
|
||||
|
||||
- `/help` — Show help for all available commands
|
||||
- `/status` — Display current session and workspace status
|
||||
- `/cost` — Show token usage and cost estimates for the session
|
||||
- `/config` — Display current configuration and environment state
|
||||
- `/session` — Show session ID, creation time, and persisted metadata
|
||||
- `/model` — Display or switch the active model
|
||||
- `/permissions` — Check sandbox permissions and capability grants
|
||||
- `/export [file]` — Export the current conversation to a file (or resume from backup)
|
||||
- `/ultraplan [task]` — Run a deep planning prompt with multi-step reasoning (good for complex refactoring tasks)
|
||||
- `/teleport <symbol-or-path>` — Jump to a file or symbol by searching the workspace (IDE-like navigation)
|
||||
- `/bughunter [scope]` — Inspect the codebase for likely bugs in an optional scope (e.g., `src/runtime`)
|
||||
- `/commit` — Generate a commit message and create a git commit from the conversation
|
||||
- `/pr [context]` — Draft or create a pull request from the conversation
|
||||
- `/issue [context]` — Draft or create a GitHub issue from the conversation
|
||||
- `/diff` — Show unified diff of changes made in the current session
|
||||
- `/plugin [list|install|enable|disable|uninstall|update]` — Manage Claw Code plugins
|
||||
- `/agents [list|help]` — List configured agents or get help on agent commands
|
||||
|
||||
## Config file resolution order
|
||||
|
||||
|
||||
@@ -753,14 +753,14 @@ mod tests {
|
||||
#[test]
|
||||
fn returns_context_window_metadata_for_kimi_models() {
|
||||
// kimi-k2.5
|
||||
let k25_limit =
|
||||
model_token_limit("kimi-k2.5").expect("kimi-k2.5 should have token limit metadata");
|
||||
let k25_limit = model_token_limit("kimi-k2.5")
|
||||
.expect("kimi-k2.5 should have token limit metadata");
|
||||
assert_eq!(k25_limit.max_output_tokens, 16_384);
|
||||
assert_eq!(k25_limit.context_window_tokens, 256_000);
|
||||
|
||||
// kimi-k1.5
|
||||
let k15_limit =
|
||||
model_token_limit("kimi-k1.5").expect("kimi-k1.5 should have token limit metadata");
|
||||
let k15_limit = model_token_limit("kimi-k1.5")
|
||||
.expect("kimi-k1.5 should have token limit metadata");
|
||||
assert_eq!(k15_limit.max_output_tokens, 16_384);
|
||||
assert_eq!(k15_limit.context_window_tokens, 256_000);
|
||||
}
|
||||
@@ -768,13 +768,11 @@ mod tests {
|
||||
#[test]
|
||||
fn kimi_alias_resolves_to_kimi_k25_token_limits() {
|
||||
// The "kimi" alias resolves to "kimi-k2.5" via resolve_model_alias()
|
||||
let alias_limit =
|
||||
model_token_limit("kimi").expect("kimi alias should resolve to kimi-k2.5 limits");
|
||||
let direct_limit = model_token_limit("kimi-k2.5").expect("kimi-k2.5 should have limits");
|
||||
assert_eq!(
|
||||
alias_limit.max_output_tokens,
|
||||
direct_limit.max_output_tokens
|
||||
);
|
||||
let alias_limit = model_token_limit("kimi")
|
||||
.expect("kimi alias should resolve to kimi-k2.5 limits");
|
||||
let direct_limit = model_token_limit("kimi-k2.5")
|
||||
.expect("kimi-k2.5 should have limits");
|
||||
assert_eq!(alias_limit.max_output_tokens, direct_limit.max_output_tokens);
|
||||
assert_eq!(
|
||||
alias_limit.context_window_tokens,
|
||||
direct_limit.context_window_tokens
|
||||
|
||||
@@ -2195,16 +2195,9 @@ mod tests {
|
||||
|
||||
#[test]
|
||||
fn provider_specific_size_limits_are_correct() {
|
||||
assert_eq!(
|
||||
OpenAiCompatConfig::dashscope().max_request_body_bytes,
|
||||
6_291_456
|
||||
); // 6MB
|
||||
assert_eq!(
|
||||
OpenAiCompatConfig::openai().max_request_body_bytes,
|
||||
104_857_600
|
||||
); // 100MB
|
||||
assert_eq!(OpenAiCompatConfig::xai().max_request_body_bytes, 52_428_800);
|
||||
// 50MB
|
||||
assert_eq!(OpenAiCompatConfig::dashscope().max_request_body_bytes, 6_291_456); // 6MB
|
||||
assert_eq!(OpenAiCompatConfig::openai().max_request_body_bytes, 104_857_600); // 100MB
|
||||
assert_eq!(OpenAiCompatConfig::xai().max_request_body_bytes, 52_428_800); // 50MB
|
||||
}
|
||||
|
||||
#[test]
|
||||
|
||||
@@ -2623,8 +2623,10 @@ fn render_mcp_report_json_for(
|
||||
// runs, the existing serializer adds `status: "ok"` below.
|
||||
match loader.load() {
|
||||
Ok(runtime_config) => {
|
||||
let mut value =
|
||||
render_mcp_summary_report_json(cwd, runtime_config.mcp().servers());
|
||||
let mut value = render_mcp_summary_report_json(
|
||||
cwd,
|
||||
runtime_config.mcp().servers(),
|
||||
);
|
||||
if let Some(map) = value.as_object_mut() {
|
||||
map.insert("status".to_string(), Value::String("ok".to_string()));
|
||||
map.insert("config_load_error".to_string(), Value::Null);
|
||||
|
||||
@@ -172,7 +172,7 @@ async fn execute_bash_async(
|
||||
) -> io::Result<BashCommandOutput> {
|
||||
// Detect and emit ship provenance for git push operations
|
||||
detect_and_emit_ship_prepared(&input.command);
|
||||
|
||||
|
||||
let mut command = prepare_tokio_command(&input.command, &cwd, &sandbox_status, true);
|
||||
|
||||
let output_result = if let Some(timeout_ms) = input.timeout {
|
||||
|
||||
@@ -405,10 +405,7 @@ pub enum BlockedSubphase {
|
||||
#[serde(rename = "blocked.branch_freshness")]
|
||||
BranchFreshness { behind_main: u32 },
|
||||
#[serde(rename = "blocked.test_hang")]
|
||||
TestHang {
|
||||
elapsed_secs: u32,
|
||||
test_name: Option<String>,
|
||||
},
|
||||
TestHang { elapsed_secs: u32, test_name: Option<String> },
|
||||
#[serde(rename = "blocked.report_pending")]
|
||||
ReportPending { since_secs: u32 },
|
||||
}
|
||||
@@ -546,8 +543,7 @@ impl LaneEvent {
|
||||
.with_failure_class(blocker.failure_class)
|
||||
.with_detail(blocker.detail.clone());
|
||||
if let Some(ref subphase) = blocker.subphase {
|
||||
event =
|
||||
event.with_data(serde_json::to_value(subphase).expect("subphase should serialize"));
|
||||
event = event.with_data(serde_json::to_value(subphase).expect("subphase should serialize"));
|
||||
}
|
||||
event
|
||||
}
|
||||
@@ -558,8 +554,7 @@ impl LaneEvent {
|
||||
.with_failure_class(blocker.failure_class)
|
||||
.with_detail(blocker.detail.clone());
|
||||
if let Some(ref subphase) = blocker.subphase {
|
||||
event =
|
||||
event.with_data(serde_json::to_value(subphase).expect("subphase should serialize"));
|
||||
event = event.with_data(serde_json::to_value(subphase).expect("subphase should serialize"));
|
||||
}
|
||||
event
|
||||
}
|
||||
@@ -567,12 +562,8 @@ impl LaneEvent {
|
||||
/// Ship prepared — §4.44.5
|
||||
#[must_use]
|
||||
pub fn ship_prepared(emitted_at: impl Into<String>, provenance: &ShipProvenance) -> Self {
|
||||
Self::new(
|
||||
LaneEventName::ShipPrepared,
|
||||
LaneEventStatus::Ready,
|
||||
emitted_at,
|
||||
)
|
||||
.with_data(serde_json::to_value(provenance).expect("ship provenance should serialize"))
|
||||
Self::new(LaneEventName::ShipPrepared, LaneEventStatus::Ready, emitted_at)
|
||||
.with_data(serde_json::to_value(provenance).expect("ship provenance should serialize"))
|
||||
}
|
||||
|
||||
/// Ship commits selected — §4.44.5
|
||||
@@ -582,34 +573,22 @@ impl LaneEvent {
|
||||
commit_count: u32,
|
||||
commit_range: impl Into<String>,
|
||||
) -> Self {
|
||||
Self::new(
|
||||
LaneEventName::ShipCommitsSelected,
|
||||
LaneEventStatus::Ready,
|
||||
emitted_at,
|
||||
)
|
||||
.with_detail(format!("{} commits: {}", commit_count, commit_range.into()))
|
||||
Self::new(LaneEventName::ShipCommitsSelected, LaneEventStatus::Ready, emitted_at)
|
||||
.with_detail(format!("{} commits: {}", commit_count, commit_range.into()))
|
||||
}
|
||||
|
||||
/// Ship merged — §4.44.5
|
||||
#[must_use]
|
||||
pub fn ship_merged(emitted_at: impl Into<String>, provenance: &ShipProvenance) -> Self {
|
||||
Self::new(
|
||||
LaneEventName::ShipMerged,
|
||||
LaneEventStatus::Completed,
|
||||
emitted_at,
|
||||
)
|
||||
.with_data(serde_json::to_value(provenance).expect("ship provenance should serialize"))
|
||||
Self::new(LaneEventName::ShipMerged, LaneEventStatus::Completed, emitted_at)
|
||||
.with_data(serde_json::to_value(provenance).expect("ship provenance should serialize"))
|
||||
}
|
||||
|
||||
/// Ship pushed to main — §4.44.5
|
||||
#[must_use]
|
||||
pub fn ship_pushed_main(emitted_at: impl Into<String>, provenance: &ShipProvenance) -> Self {
|
||||
Self::new(
|
||||
LaneEventName::ShipPushedMain,
|
||||
LaneEventStatus::Completed,
|
||||
emitted_at,
|
||||
)
|
||||
.with_data(serde_json::to_value(provenance).expect("ship provenance should serialize"))
|
||||
Self::new(LaneEventName::ShipPushedMain, LaneEventStatus::Completed, emitted_at)
|
||||
.with_data(serde_json::to_value(provenance).expect("ship provenance should serialize"))
|
||||
}
|
||||
|
||||
#[must_use]
|
||||
|
||||
@@ -58,8 +58,8 @@ impl SessionStore {
|
||||
let workspace_root = workspace_root.as_ref();
|
||||
// #151: canonicalize workspace_root for consistent fingerprinting
|
||||
// across equivalent path representations.
|
||||
let canonical_workspace =
|
||||
fs::canonicalize(workspace_root).unwrap_or_else(|_| workspace_root.to_path_buf());
|
||||
let canonical_workspace = fs::canonicalize(workspace_root)
|
||||
.unwrap_or_else(|_| workspace_root.to_path_buf());
|
||||
let sessions_root = data_dir
|
||||
.as_ref()
|
||||
.join("sessions")
|
||||
@@ -158,9 +158,10 @@ impl SessionStore {
|
||||
}
|
||||
|
||||
pub fn latest_session(&self) -> Result<ManagedSessionSummary, SessionControlError> {
|
||||
self.list_sessions()?.into_iter().next().ok_or_else(|| {
|
||||
SessionControlError::Format(format_no_managed_sessions(&self.sessions_root))
|
||||
})
|
||||
self.list_sessions()?
|
||||
.into_iter()
|
||||
.next()
|
||||
.ok_or_else(|| SessionControlError::Format(format_no_managed_sessions(&self.sessions_root)))
|
||||
}
|
||||
|
||||
pub fn load_session(
|
||||
|
||||
@@ -1,6 +1,24 @@
|
||||
use std::env;
|
||||
use std::path::Path;
|
||||
use std::process::Command;
|
||||
|
||||
fn resolve_git_head_path() -> Option<String> {
|
||||
let git_path = Path::new(".git");
|
||||
if git_path.is_file() {
|
||||
// Worktree: .git is a pointer file containing "gitdir: /path/to/real/.git/worktrees/<name>"
|
||||
if let Ok(content) = std::fs::read_to_string(git_path) {
|
||||
if let Some(gitdir) = content.strip_prefix("gitdir:") {
|
||||
let gitdir = gitdir.trim();
|
||||
return Some(format!("{}/HEAD", gitdir));
|
||||
}
|
||||
}
|
||||
} else if git_path.is_dir() {
|
||||
// Regular repo: .git is a directory
|
||||
return Some(".git/HEAD".to_string());
|
||||
}
|
||||
None
|
||||
}
|
||||
|
||||
fn main() {
|
||||
// Get git SHA (short hash)
|
||||
let git_sha = Command::new("git")
|
||||
@@ -52,6 +70,12 @@ fn main() {
|
||||
println!("cargo:rustc-env=BUILD_DATE={build_date}");
|
||||
|
||||
// Rerun if git state changes
|
||||
println!("cargo:rerun-if-changed=.git/HEAD");
|
||||
// In worktrees, .git is a pointer file, so watch the actual HEAD location
|
||||
if let Some(head_path) = resolve_git_head_path() {
|
||||
println!("cargo:rerun-if-changed={}", head_path);
|
||||
} else {
|
||||
// Fallback to .git/HEAD for regular repos (won't trigger in worktrees, but prevents silent failure)
|
||||
println!("cargo:rerun-if-changed=.git/HEAD");
|
||||
}
|
||||
println!("cargo:rerun-if-changed=.git/refs");
|
||||
}
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -172,10 +172,7 @@ stderr:
|
||||
);
|
||||
let stdout = String::from_utf8(output.stdout).expect("stdout should be utf8");
|
||||
let parsed: Value = serde_json::from_str(&stdout).expect("compact json stdout should parse");
|
||||
assert_eq!(
|
||||
parsed["message"],
|
||||
"Mock streaming says hello from the parity harness."
|
||||
);
|
||||
assert_eq!(parsed["message"], "Mock streaming says hello from the parity harness.");
|
||||
assert_eq!(parsed["compact"], true);
|
||||
assert_eq!(parsed["model"], "claude-sonnet-4-6");
|
||||
assert!(parsed["usage"].is_object());
|
||||
|
||||
@@ -388,6 +388,484 @@ fn assert_json_command(current_dir: &Path, args: &[&str]) -> Value {
|
||||
assert_json_command_with_env(current_dir, args, &[])
|
||||
}
|
||||
|
||||
/// #247 regression helper: run claw expecting a non-zero exit and return
|
||||
/// the JSON error envelope parsed from stdout. Asserts exit != 0 and that
|
||||
/// the envelope includes `type: "error"` at the very least.
|
||||
///
|
||||
/// #168c: Error envelopes under --output-format json are now emitted to
|
||||
/// STDOUT (not stderr). This matches the emission contract that stdout
|
||||
/// carries the contractual envelope (success OR error) while stderr is
|
||||
/// reserved for non-contractual diagnostics.
|
||||
fn assert_json_error_envelope(current_dir: &Path, args: &[&str]) -> Value {
|
||||
let output = run_claw(current_dir, args, &[]);
|
||||
assert!(
|
||||
!output.status.success(),
|
||||
"command unexpectedly succeeded; stdout:\n{}\nstderr:\n{}",
|
||||
String::from_utf8_lossy(&output.stdout),
|
||||
String::from_utf8_lossy(&output.stderr)
|
||||
);
|
||||
// #168c: The JSON envelope is written to STDOUT for error cases under
|
||||
// --output-format json (see main.rs). Previously was stderr.
|
||||
let envelope: Value = serde_json::from_slice(&output.stdout).unwrap_or_else(|err| {
|
||||
panic!(
|
||||
"stdout should be a JSON error envelope but failed to parse: {err}\nstdout bytes:\n{}\nstderr bytes:\n{}",
|
||||
String::from_utf8_lossy(&output.stdout),
|
||||
String::from_utf8_lossy(&output.stderr)
|
||||
)
|
||||
});
|
||||
assert_eq!(
|
||||
envelope["type"], "error",
|
||||
"envelope should carry type=error"
|
||||
);
|
||||
envelope
|
||||
}
|
||||
|
||||
/// #168c regression test: under `--output-format json`, error envelopes
|
||||
/// must be emitted to STDOUT (not stderr). This is the emission contract:
|
||||
/// stdout carries the JSON envelope regardless of success/error; stderr
|
||||
/// is reserved for non-contractual diagnostics.
|
||||
///
|
||||
/// Refutes cycle #84's "bootstrap silent failure" claim (cycle #87 controlled
|
||||
/// matrix showed errors were on stderr, not silent; cycle #88 locked the
|
||||
/// emission contract to require stdout).
|
||||
#[test]
|
||||
fn error_envelope_emitted_to_stdout_under_output_format_json_168c() {
|
||||
let root = unique_temp_dir("168c-emission-stdout");
|
||||
fs::create_dir_all(&root).expect("temp dir should exist");
|
||||
|
||||
// Trigger an error via `prompt` without arg (known cli_parse error).
|
||||
let output = run_claw(&root, &["--output-format", "json", "prompt"], &[]);
|
||||
|
||||
// Exit code must be non-zero (error).
|
||||
assert!(
|
||||
!output.status.success(),
|
||||
"prompt without arg must fail; stdout:\n{}\nstderr:\n{}",
|
||||
String::from_utf8_lossy(&output.stdout),
|
||||
String::from_utf8_lossy(&output.stderr)
|
||||
);
|
||||
|
||||
// #168c primary assertion: stdout carries the JSON envelope.
|
||||
let stdout_text = String::from_utf8_lossy(&output.stdout);
|
||||
assert!(
|
||||
!stdout_text.trim().is_empty(),
|
||||
"stdout must contain JSON envelope under --output-format json (#168c emission contract). stderr was:\n{}",
|
||||
String::from_utf8_lossy(&output.stderr)
|
||||
);
|
||||
let envelope: Value = serde_json::from_slice(&output.stdout).unwrap_or_else(|err| {
|
||||
panic!(
|
||||
"stdout should be valid JSON under --output-format json (#168c): {err}\nstdout bytes:\n{stdout_text}"
|
||||
)
|
||||
});
|
||||
assert_eq!(envelope["type"], "error", "envelope must be typed error");
|
||||
assert!(
|
||||
envelope["kind"].as_str().is_some(),
|
||||
"envelope must carry machine-readable kind"
|
||||
);
|
||||
|
||||
// #168c secondary assertion: stderr should NOT carry the JSON envelope
|
||||
// (it may be empty or contain non-JSON diagnostics, but the envelope
|
||||
// belongs on stdout under --output-format json).
|
||||
let stderr_text = String::from_utf8_lossy(&output.stderr);
|
||||
let stderr_trimmed = stderr_text.trim();
|
||||
if !stderr_trimmed.is_empty() {
|
||||
// If stderr has content, it must NOT be the JSON envelope.
|
||||
let stderr_is_json: Result<Value, _> = serde_json::from_slice(&output.stderr);
|
||||
assert!(
|
||||
stderr_is_json.is_err(),
|
||||
"stderr must not duplicate the JSON envelope (#168c); stderr was:\n{stderr_trimmed}"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn prompt_subcommand_without_arg_emits_cli_parse_envelope_with_hint_247() {
|
||||
// #247: `claw prompt` with no argument must classify as `cli_parse`
|
||||
// (not `unknown`) and the JSON envelope must carry the same actionable
|
||||
// `Run claw --help for usage.` hint that text-mode stderr appends.
|
||||
let root = unique_temp_dir("247-prompt-no-arg");
|
||||
fs::create_dir_all(&root).expect("temp dir should exist");
|
||||
|
||||
let envelope = assert_json_error_envelope(&root, &["--output-format", "json", "prompt"]);
|
||||
assert_eq!(
|
||||
envelope["kind"], "cli_parse",
|
||||
"prompt subcommand without arg should classify as cli_parse, envelope: {envelope}"
|
||||
);
|
||||
assert_eq!(
|
||||
envelope["error"], "prompt subcommand requires a prompt string",
|
||||
"short reason should match the raw error, envelope: {envelope}"
|
||||
);
|
||||
assert_eq!(
|
||||
envelope["hint"],
|
||||
"Run `claw --help` for usage.",
|
||||
"JSON envelope must carry the same help-runbook hint as text mode, envelope: {envelope}"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn empty_positional_arg_emits_cli_parse_envelope_247() {
|
||||
// #247: `claw ""` must classify as `cli_parse`, not `unknown`. The
|
||||
// message itself embeds a ``run `claw --help`` pointer so the explicit
|
||||
// hint field is allowed to remain null to avoid duplication — what
|
||||
// matters for the typed-error contract is that `kind == cli_parse`.
|
||||
let root = unique_temp_dir("247-empty-arg");
|
||||
fs::create_dir_all(&root).expect("temp dir should exist");
|
||||
|
||||
let envelope = assert_json_error_envelope(&root, &["--output-format", "json", ""]);
|
||||
assert_eq!(
|
||||
envelope["kind"], "cli_parse",
|
||||
"empty-prompt error should classify as cli_parse, envelope: {envelope}"
|
||||
);
|
||||
let short = envelope["error"]
|
||||
.as_str()
|
||||
.expect("error field should be a string");
|
||||
assert!(
|
||||
short.starts_with("empty prompt:"),
|
||||
"short reason should preserve the original empty-prompt message, got: {short}"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn whitespace_only_positional_arg_emits_cli_parse_envelope_247() {
|
||||
// #247: same rule for `claw " "` — any whitespace-only prompt must
|
||||
// flow through the empty-prompt path and classify as `cli_parse`.
|
||||
let root = unique_temp_dir("247-whitespace-arg");
|
||||
fs::create_dir_all(&root).expect("temp dir should exist");
|
||||
|
||||
let envelope = assert_json_error_envelope(&root, &["--output-format", "json", " "]);
|
||||
assert_eq!(
|
||||
envelope["kind"], "cli_parse",
|
||||
"whitespace-only prompt should classify as cli_parse, envelope: {envelope}"
|
||||
);
|
||||
}
|
||||
|
||||
/// #168c Phase 0 Task 2: No-silent guarantee.
|
||||
///
|
||||
/// Under `--output-format json`, every verb must satisfy the emission contract:
|
||||
/// either emit a valid JSON envelope to stdout (with exit 0 for success, or
|
||||
/// exit != 0 for error), OR exit with an error code. Silent success (exit 0
|
||||
/// with empty stdout) is forbidden under the JSON contract because consumers
|
||||
/// cannot distinguish success from broken emission.
|
||||
///
|
||||
/// This test iterates a catalog of clawable verbs and asserts:
|
||||
/// 1. Each verb produces stdout output when exit == 0 (no silent success)
|
||||
/// 2. The stdout output parses as JSON (emission contract integrity)
|
||||
/// 3. Error cases (exit != 0) produce JSON on stdout (#168c routing fix)
|
||||
///
|
||||
/// Phase 0 Task 2 deliverable: prevents regressions in the emission contract
|
||||
/// for the full set of discoverable verbs.
|
||||
#[test]
|
||||
fn emission_contract_no_silent_success_under_output_format_json_168c_task2() {
|
||||
let root = unique_temp_dir("168c-task2-no-silent");
|
||||
fs::create_dir_all(&root).expect("temp dir should exist");
|
||||
|
||||
// Verbs expected to succeed (exit 0) with non-empty JSON on stdout.
|
||||
// Covers the discovery-safe subset — verbs that don't require external
|
||||
// credentials or network and should be safely invokable in CI.
|
||||
let safe_success_verbs: &[(&str, &[&str])] = &[
|
||||
("help", &["help"]),
|
||||
("version", &["version"]),
|
||||
("list-sessions", &["list-sessions"]),
|
||||
("doctor", &["doctor"]),
|
||||
("mcp", &["mcp"]),
|
||||
("skills", &["skills"]),
|
||||
("agents", &["agents"]),
|
||||
("sandbox", &["sandbox"]),
|
||||
("status", &["status"]),
|
||||
("system-prompt", &["system-prompt"]),
|
||||
("bootstrap-plan", &["bootstrap-plan", "test"]),
|
||||
("acp", &["acp"]),
|
||||
];
|
||||
|
||||
for (verb, args) in safe_success_verbs {
|
||||
let mut full_args = vec!["--output-format", "json"];
|
||||
full_args.extend_from_slice(args);
|
||||
let output = run_claw(&root, &full_args, &[]);
|
||||
|
||||
// Emission contract clause 1: if exit == 0, stdout must be non-empty.
|
||||
if output.status.success() {
|
||||
let stdout_text = String::from_utf8_lossy(&output.stdout);
|
||||
assert!(
|
||||
!stdout_text.trim().is_empty(),
|
||||
"#168c Task 2 emission contract violation: `{verb}` exit 0 with empty stdout (silent success). stderr was:\n{}",
|
||||
String::from_utf8_lossy(&output.stderr)
|
||||
);
|
||||
|
||||
// Emission contract clause 2: stdout must be valid JSON.
|
||||
let envelope: Result<Value, _> = serde_json::from_slice(&output.stdout);
|
||||
assert!(
|
||||
envelope.is_ok(),
|
||||
"#168c Task 2 emission contract violation: `{verb}` stdout is not valid JSON:\n{stdout_text}"
|
||||
);
|
||||
}
|
||||
// If exit != 0, it's an error path; #168c primary test covers error routing.
|
||||
}
|
||||
|
||||
// Verbs expected to fail (exit != 0) in test env (require external state).
|
||||
// Emission contract clause 3: error paths must still emit JSON on stdout.
|
||||
let safe_error_verbs: &[(&str, &[&str])] = &[
|
||||
("prompt-no-arg", &["prompt"]),
|
||||
("doctor-bad-arg", &["doctor", "--foo"]),
|
||||
];
|
||||
|
||||
for (label, args) in safe_error_verbs {
|
||||
let mut full_args = vec!["--output-format", "json"];
|
||||
full_args.extend_from_slice(args);
|
||||
let output = run_claw(&root, &full_args, &[]);
|
||||
|
||||
assert!(
|
||||
!output.status.success(),
|
||||
"{label} was expected to fail but exited 0"
|
||||
);
|
||||
|
||||
// #168c: error envelopes must be on stdout.
|
||||
let stdout_text = String::from_utf8_lossy(&output.stdout);
|
||||
assert!(
|
||||
!stdout_text.trim().is_empty(),
|
||||
"#168c Task 2 emission contract violation: {label} failed with empty stdout. stderr was:\n{}",
|
||||
String::from_utf8_lossy(&output.stderr)
|
||||
);
|
||||
|
||||
let envelope: Result<Value, _> = serde_json::from_slice(&output.stdout);
|
||||
assert!(
|
||||
envelope.is_ok(),
|
||||
"#168c Task 2 emission contract violation: {label} stdout not valid JSON:\n{stdout_text}"
|
||||
);
|
||||
let envelope = envelope.unwrap();
|
||||
assert_eq!(
|
||||
envelope["type"], "error",
|
||||
"{label} error envelope must carry type=error, got: {envelope}"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
/// #168c Phase 0 Task 4: Shape parity / regression guard.
|
||||
///
|
||||
/// Locks the v1.5 emission baseline (documented in SCHEMAS.md § v1.5 Emission
|
||||
/// Baseline) so any future PR that introduces shape drift in a documented
|
||||
/// verb fails this test at PR time.
|
||||
///
|
||||
/// This complements Task 2 (no-silent guarantee) by asserting the SPECIFIC
|
||||
/// top-level key sets documented in the catalog. If a verb adds/removes a
|
||||
/// top-level field, this test fails — forcing the PR author to:
|
||||
/// (a) update SCHEMAS.md § v1.5 Emission Baseline with the new shape, and
|
||||
/// (b) acknowledge the v1.5 baseline is changing.
|
||||
///
|
||||
/// Phase 0 Task 4 deliverable: prevents undocumented shape drift in v1.5
|
||||
/// baseline before Phase 1 (shape normalization) begins.
|
||||
///
|
||||
/// Note: This test intentionally asserts the CURRENT (possibly imperfect)
|
||||
/// shape, NOT the target. Phase 1 will update these expectations as shapes
|
||||
/// normalize.
|
||||
#[test]
|
||||
fn v1_5_emission_baseline_shape_parity_168c_task4() {
|
||||
let root = unique_temp_dir("168c-task4-shape-parity");
|
||||
fs::create_dir_all(&root).expect("temp dir should exist");
|
||||
|
||||
// v1.5 baseline per-verb shape catalog (from SCHEMAS.md § v1.5 Emission Baseline).
|
||||
// Each entry: (verb, args, expected_top_level_keys_sorted).
|
||||
//
|
||||
// This catalog was captured by the cycle #87 controlled matrix and is
|
||||
// enforced by SCHEMAS.md § v1.5 Emission Baseline documentation.
|
||||
let baseline: &[(&str, &[&str], &[&str])] = &[
|
||||
// Verbs using `kind` field (12 of 13 success paths)
|
||||
("help", &["help"], &["kind", "message"]),
|
||||
(
|
||||
"version",
|
||||
&["version"],
|
||||
&["git_sha", "kind", "message", "target", "version"],
|
||||
),
|
||||
(
|
||||
"doctor",
|
||||
&["doctor"],
|
||||
&["checks", "has_failures", "kind", "message", "report", "summary"],
|
||||
),
|
||||
(
|
||||
"skills",
|
||||
&["skills"],
|
||||
&["action", "kind", "skills", "summary"],
|
||||
),
|
||||
(
|
||||
"agents",
|
||||
&["agents"],
|
||||
&["action", "agents", "count", "kind", "summary", "working_directory"],
|
||||
),
|
||||
(
|
||||
"system-prompt",
|
||||
&["system-prompt"],
|
||||
&["kind", "message", "sections"],
|
||||
),
|
||||
(
|
||||
"bootstrap-plan",
|
||||
&["bootstrap-plan", "test"],
|
||||
&["kind", "phases"],
|
||||
),
|
||||
// Verb using `command` field (the 1-of-13 deviation — Phase 1 target)
|
||||
(
|
||||
"list-sessions",
|
||||
&["list-sessions"],
|
||||
&["command", "sessions"],
|
||||
),
|
||||
];
|
||||
|
||||
for (verb, args, expected_keys) in baseline {
|
||||
let mut full_args = vec!["--output-format", "json"];
|
||||
full_args.extend_from_slice(args);
|
||||
let output = run_claw(&root, &full_args, &[]);
|
||||
|
||||
assert!(
|
||||
output.status.success(),
|
||||
"#168c Task 4: `{verb}` expected success path but exited with {:?}. stdout:\n{}\nstderr:\n{}",
|
||||
output.status.code(),
|
||||
String::from_utf8_lossy(&output.stdout),
|
||||
String::from_utf8_lossy(&output.stderr)
|
||||
);
|
||||
|
||||
let envelope: Value = serde_json::from_slice(&output.stdout).unwrap_or_else(|err| {
|
||||
panic!(
|
||||
"#168c Task 4: `{verb}` stdout not valid JSON: {err}\nstdout:\n{}",
|
||||
String::from_utf8_lossy(&output.stdout)
|
||||
)
|
||||
});
|
||||
|
||||
let actual_keys: Vec<String> = envelope
|
||||
.as_object()
|
||||
.unwrap_or_else(|| panic!("#168c Task 4: `{verb}` envelope not a JSON object"))
|
||||
.keys()
|
||||
.cloned()
|
||||
.collect();
|
||||
let mut actual_sorted = actual_keys.clone();
|
||||
actual_sorted.sort();
|
||||
|
||||
let mut expected_sorted: Vec<String> = expected_keys.iter().map(|s| s.to_string()).collect();
|
||||
expected_sorted.sort();
|
||||
|
||||
assert_eq!(
|
||||
actual_sorted, expected_sorted,
|
||||
"#168c Task 4: shape drift detected in `{verb}`!\n\
|
||||
Expected top-level keys (v1.5 baseline): {expected_sorted:?}\n\
|
||||
Actual top-level keys: {actual_sorted:?}\n\
|
||||
If this is intentional, update:\n\
|
||||
1. SCHEMAS.md § v1.5 Emission Baseline catalog\n\
|
||||
2. This test's `baseline` array\n\
|
||||
Envelope: {envelope}"
|
||||
);
|
||||
}
|
||||
|
||||
// Error envelope shape parity (all error paths).
|
||||
// Standard v1.5 error envelope: {error, hint, kind, type} (always 4 keys).
|
||||
let error_cases: &[(&str, &[&str])] = &[
|
||||
("prompt-no-arg", &["prompt"]),
|
||||
("doctor-bad-arg", &["doctor", "--foo"]),
|
||||
];
|
||||
|
||||
let expected_error_keys = ["error", "hint", "kind", "type"];
|
||||
let mut expected_error_sorted: Vec<String> =
|
||||
expected_error_keys.iter().map(|s| s.to_string()).collect();
|
||||
expected_error_sorted.sort();
|
||||
|
||||
for (label, args) in error_cases {
|
||||
let mut full_args = vec!["--output-format", "json"];
|
||||
full_args.extend_from_slice(args);
|
||||
let output = run_claw(&root, &full_args, &[]);
|
||||
|
||||
assert!(
|
||||
!output.status.success(),
|
||||
"{label}: expected error exit, got success"
|
||||
);
|
||||
|
||||
let envelope: Value = serde_json::from_slice(&output.stdout).unwrap_or_else(|err| {
|
||||
panic!(
|
||||
"#168c Task 4: {label} stdout not valid JSON: {err}\nstdout:\n{}",
|
||||
String::from_utf8_lossy(&output.stdout)
|
||||
)
|
||||
});
|
||||
|
||||
let actual_keys: Vec<String> = envelope
|
||||
.as_object()
|
||||
.unwrap_or_else(|| panic!("#168c Task 4: {label} envelope not a JSON object"))
|
||||
.keys()
|
||||
.cloned()
|
||||
.collect();
|
||||
let mut actual_sorted = actual_keys.clone();
|
||||
actual_sorted.sort();
|
||||
|
||||
assert_eq!(
|
||||
actual_sorted, expected_error_sorted,
|
||||
"#168c Task 4: error envelope shape drift detected in {label}!\n\
|
||||
Expected v1.5 error envelope keys: {expected_error_sorted:?}\n\
|
||||
Actual keys: {actual_sorted:?}\n\
|
||||
If this is intentional, update SCHEMAS.md § Standard Error Envelope (v1.5).\n\
|
||||
Envelope: {envelope}"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn unrecognized_argument_still_classifies_as_cli_parse_247_regression_guard() {
|
||||
// #247 regression guard: the new empty-prompt / prompt-subcommand
|
||||
// patterns must NOT hijack the existing #77 unrecognized-argument
|
||||
// classification. `claw doctor --foo` must still surface as cli_parse
|
||||
// with the runbook hint present.
|
||||
let root = unique_temp_dir("247-unrecognized-arg");
|
||||
fs::create_dir_all(&root).expect("temp dir should exist");
|
||||
|
||||
let envelope =
|
||||
assert_json_error_envelope(&root, &["--output-format", "json", "doctor", "--foo"]);
|
||||
assert_eq!(
|
||||
envelope["kind"], "cli_parse",
|
||||
"unrecognized-argument must remain cli_parse, envelope: {envelope}"
|
||||
);
|
||||
assert_eq!(
|
||||
envelope["hint"],
|
||||
"Run `claw --help` for usage.",
|
||||
"unrecognized-argument hint should stay intact, envelope: {envelope}"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn v1_5_action_field_appears_only_in_3_inventory_verbs_172() {
|
||||
// #172: SCHEMAS.md v1.5 Emission Baseline claims `action` field appears
|
||||
// only in 3 inventory verbs: mcp, skills, agents. This test is a
|
||||
// regression guard for that truthfulness claim. If a new verb adds
|
||||
// `action`, or one of the 3 removes it, this test fails and forces
|
||||
// the SCHEMAS.md documentation to stay in sync with reality.
|
||||
//
|
||||
// Discovered during cycle #98 probe: earlier SCHEMAS.md draft said
|
||||
// "only in 4 inventory verbs" but reality was only 3 (list-sessions
|
||||
// uses `command` instead of `action`). Doc was corrected; this test
|
||||
// locks the 3-verb invariant.
|
||||
let root = unique_temp_dir("172-action-inventory");
|
||||
fs::create_dir_all(&root).expect("temp dir should exist");
|
||||
|
||||
let verbs_with_action: &[&str] = &["mcp", "skills", "agents"];
|
||||
let verbs_without_action: &[&str] = &[
|
||||
"help",
|
||||
"version",
|
||||
"doctor",
|
||||
"status",
|
||||
"sandbox",
|
||||
"system-prompt",
|
||||
"bootstrap-plan",
|
||||
"list-sessions",
|
||||
];
|
||||
|
||||
for verb in verbs_with_action {
|
||||
let envelope = assert_json_command(&root, &["--output-format", "json", verb]);
|
||||
assert!(
|
||||
envelope.get("action").is_some(),
|
||||
"#172: `{verb}` should have `action` field per v1.5 baseline, but envelope: {envelope}"
|
||||
);
|
||||
}
|
||||
|
||||
for verb in verbs_without_action {
|
||||
let envelope = assert_json_command(&root, &["--output-format", "json", verb]);
|
||||
assert!(
|
||||
envelope.get("action").is_none(),
|
||||
"#172: `{verb}` should NOT have `action` field per v1.5 baseline (only 3 inventory verbs: mcp/skills/agents should have it), but envelope: {envelope}"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
fn assert_json_command_with_env(current_dir: &Path, args: &[&str], envs: &[(&str, &str)]) -> Value {
|
||||
let output = run_claw(current_dir, args, envs);
|
||||
assert!(
|
||||
|
||||
@@ -5,7 +5,16 @@ from .parity_audit import ParityAuditResult, run_parity_audit
|
||||
from .port_manifest import PortManifest, build_port_manifest
|
||||
from .query_engine import QueryEnginePort, TurnResult
|
||||
from .runtime import PortRuntime, RuntimeSession
|
||||
from .session_store import StoredSession, load_session, save_session
|
||||
from .session_store import (
|
||||
SessionDeleteError,
|
||||
SessionNotFoundError,
|
||||
StoredSession,
|
||||
delete_session,
|
||||
list_sessions,
|
||||
load_session,
|
||||
save_session,
|
||||
session_exists,
|
||||
)
|
||||
from .system_init import build_system_init_message
|
||||
from .tools import PORTED_TOOLS, build_tool_backlog
|
||||
|
||||
@@ -15,6 +24,8 @@ __all__ = [
|
||||
'PortRuntime',
|
||||
'QueryEnginePort',
|
||||
'RuntimeSession',
|
||||
'SessionDeleteError',
|
||||
'SessionNotFoundError',
|
||||
'StoredSession',
|
||||
'TurnResult',
|
||||
'PORTED_COMMANDS',
|
||||
@@ -23,7 +34,10 @@ __all__ = [
|
||||
'build_port_manifest',
|
||||
'build_system_init_message',
|
||||
'build_tool_backlog',
|
||||
'delete_session',
|
||||
'list_sessions',
|
||||
'load_session',
|
||||
'run_parity_audit',
|
||||
'save_session',
|
||||
'session_exists',
|
||||
]
|
||||
|
||||
593
src/main.py
593
src/main.py
@@ -12,22 +12,48 @@ from .port_manifest import build_port_manifest
|
||||
from .query_engine import QueryEnginePort
|
||||
from .remote_runtime import run_remote_mode, run_ssh_mode, run_teleport_mode
|
||||
from .runtime import PortRuntime
|
||||
from .session_store import load_session
|
||||
from .session_store import (
|
||||
SessionDeleteError,
|
||||
SessionNotFoundError,
|
||||
delete_session,
|
||||
list_sessions,
|
||||
load_session,
|
||||
session_exists,
|
||||
)
|
||||
from .setup import run_setup
|
||||
from .tool_pool import assemble_tool_pool
|
||||
from .tools import execute_tool, get_tool, get_tools, render_tool_index
|
||||
|
||||
|
||||
def wrap_json_envelope(data: dict, command: str, exit_code: int = 0) -> dict:
|
||||
"""Wrap command output in canonical JSON envelope per SCHEMAS.md."""
|
||||
from datetime import datetime, timezone
|
||||
now_utc = datetime.now(timezone.utc).isoformat(timespec='seconds').replace('+00:00', 'Z')
|
||||
return {
|
||||
'timestamp': now_utc,
|
||||
'command': command,
|
||||
'exit_code': exit_code,
|
||||
'output_format': 'json',
|
||||
'schema_version': '1.0',
|
||||
**data,
|
||||
}
|
||||
|
||||
|
||||
def build_parser() -> argparse.ArgumentParser:
|
||||
parser = argparse.ArgumentParser(description='Python porting workspace for the Claude Code rewrite effort')
|
||||
# #180: Add --version flag to match canonical CLI contract
|
||||
parser.add_argument('--version', action='version', version='claw-code 1.0.0 (Python harness)')
|
||||
subparsers = parser.add_subparsers(dest='command', required=True)
|
||||
subparsers.add_parser('summary', help='render a Markdown summary of the Python porting workspace')
|
||||
subparsers.add_parser('manifest', help='print the current Python workspace manifest')
|
||||
subparsers.add_parser('parity-audit', help='compare the Python workspace against the local ignored TypeScript archive when available')
|
||||
subparsers.add_parser('setup-report', help='render the startup/prefetch setup report')
|
||||
subparsers.add_parser('command-graph', help='show command graph segmentation')
|
||||
subparsers.add_parser('tool-pool', help='show assembled tool pool with default settings')
|
||||
subparsers.add_parser('bootstrap-graph', help='show the mirrored bootstrap/runtime graph stages')
|
||||
command_graph_parser = subparsers.add_parser('command-graph', help='show command graph segmentation')
|
||||
command_graph_parser.add_argument('--output-format', choices=['text', 'json'], default='text')
|
||||
tool_pool_parser = subparsers.add_parser('tool-pool', help='show assembled tool pool with default settings')
|
||||
tool_pool_parser.add_argument('--output-format', choices=['text', 'json'], default='text')
|
||||
bootstrap_graph_parser = subparsers.add_parser('bootstrap-graph', help='show the mirrored bootstrap/runtime graph stages')
|
||||
bootstrap_graph_parser.add_argument('--output-format', choices=['text', 'json'], default='text')
|
||||
list_parser = subparsers.add_parser('subsystems', help='list the current Python modules in the workspace')
|
||||
list_parser.add_argument('--limit', type=int, default=32)
|
||||
|
||||
@@ -48,22 +74,104 @@ def build_parser() -> argparse.ArgumentParser:
|
||||
route_parser = subparsers.add_parser('route', help='route a prompt across mirrored command/tool inventories')
|
||||
route_parser.add_argument('prompt')
|
||||
route_parser.add_argument('--limit', type=int, default=5)
|
||||
# #168: parity with show-command/show-tool/session-lifecycle CLI family
|
||||
route_parser.add_argument('--output-format', choices=['text', 'json'], default='text')
|
||||
|
||||
bootstrap_parser = subparsers.add_parser('bootstrap', help='build a runtime-style session report from the mirrored inventories')
|
||||
bootstrap_parser.add_argument('prompt')
|
||||
bootstrap_parser.add_argument('--limit', type=int, default=5)
|
||||
# #168: parity with CLI family
|
||||
bootstrap_parser.add_argument('--output-format', choices=['text', 'json'], default='text')
|
||||
|
||||
loop_parser = subparsers.add_parser('turn-loop', help='run a small stateful turn loop for the mirrored runtime')
|
||||
loop_parser.add_argument('prompt')
|
||||
loop_parser.add_argument('--limit', type=int, default=5)
|
||||
loop_parser.add_argument('--max-turns', type=int, default=3)
|
||||
loop_parser.add_argument('--structured-output', action='store_true')
|
||||
loop_parser.add_argument(
|
||||
'--timeout-seconds',
|
||||
type=float,
|
||||
default=None,
|
||||
help='total wall-clock budget across all turns (#161). Default: unbounded.',
|
||||
)
|
||||
loop_parser.add_argument(
|
||||
'--continuation-prompt',
|
||||
default=None,
|
||||
help=(
|
||||
'prompt to submit on turns after the first (#163). Default: None '
|
||||
'(loop stops after turn 0). Replaces the deprecated implicit "[turn N]" '
|
||||
'suffix that used to pollute the transcript.'
|
||||
),
|
||||
)
|
||||
loop_parser.add_argument(
|
||||
'--output-format',
|
||||
choices=['text', 'json'],
|
||||
default='text',
|
||||
help='output format (#164 Stage B: JSON includes cancel_observed per turn)',
|
||||
)
|
||||
|
||||
flush_parser = subparsers.add_parser('flush-transcript', help='persist and flush a temporary session transcript')
|
||||
flush_parser = subparsers.add_parser(
|
||||
'flush-transcript',
|
||||
help='persist and flush a temporary session transcript (#160/#166: claw-native session API)',
|
||||
)
|
||||
flush_parser.add_argument('prompt')
|
||||
flush_parser.add_argument(
|
||||
'--directory', help='session storage directory (default: .port_sessions)'
|
||||
)
|
||||
flush_parser.add_argument(
|
||||
'--output-format',
|
||||
choices=['text', 'json'],
|
||||
default='text',
|
||||
help='output format',
|
||||
)
|
||||
flush_parser.add_argument(
|
||||
'--session-id',
|
||||
help='deterministic session ID (default: auto-generated UUID)',
|
||||
)
|
||||
|
||||
load_session_parser = subparsers.add_parser('load-session', help='load a previously persisted session')
|
||||
load_session_parser = subparsers.add_parser(
|
||||
'load-session',
|
||||
help='load a previously persisted session (#160/#165: claw-native session API)',
|
||||
)
|
||||
load_session_parser.add_argument('session_id')
|
||||
load_session_parser.add_argument(
|
||||
'--directory', help='session storage directory (default: .port_sessions)'
|
||||
)
|
||||
load_session_parser.add_argument(
|
||||
'--output-format',
|
||||
choices=['text', 'json'],
|
||||
default='text',
|
||||
help='output format',
|
||||
)
|
||||
|
||||
list_sessions_parser = subparsers.add_parser(
|
||||
'list-sessions',
|
||||
help='enumerate stored session IDs (#160: claw-native session API)',
|
||||
)
|
||||
list_sessions_parser.add_argument(
|
||||
'--directory', help='session storage directory (default: .port_sessions)'
|
||||
)
|
||||
list_sessions_parser.add_argument(
|
||||
'--output-format',
|
||||
choices=['text', 'json'],
|
||||
default='text',
|
||||
help='output format',
|
||||
)
|
||||
|
||||
delete_session_parser = subparsers.add_parser(
|
||||
'delete-session',
|
||||
help='delete a persisted session (#160: idempotent, race-safe)',
|
||||
)
|
||||
delete_session_parser.add_argument('session_id')
|
||||
delete_session_parser.add_argument(
|
||||
'--directory', help='session storage directory (default: .port_sessions)'
|
||||
)
|
||||
delete_session_parser.add_argument(
|
||||
'--output-format',
|
||||
choices=['text', 'json'],
|
||||
default='text',
|
||||
help='output format',
|
||||
)
|
||||
|
||||
remote_parser = subparsers.add_parser('remote-mode', help='simulate remote-control runtime branching')
|
||||
remote_parser.add_argument('target')
|
||||
@@ -78,22 +186,112 @@ def build_parser() -> argparse.ArgumentParser:
|
||||
|
||||
show_command = subparsers.add_parser('show-command', help='show one mirrored command entry by exact name')
|
||||
show_command.add_argument('name')
|
||||
show_command.add_argument('--output-format', choices=['text', 'json'], default='text')
|
||||
show_tool = subparsers.add_parser('show-tool', help='show one mirrored tool entry by exact name')
|
||||
show_tool.add_argument('name')
|
||||
show_tool.add_argument('--output-format', choices=['text', 'json'], default='text')
|
||||
|
||||
exec_command_parser = subparsers.add_parser('exec-command', help='execute a mirrored command shim by exact name')
|
||||
exec_command_parser.add_argument('name')
|
||||
exec_command_parser.add_argument('prompt')
|
||||
# #168: parity with CLI family
|
||||
exec_command_parser.add_argument('--output-format', choices=['text', 'json'], default='text')
|
||||
|
||||
exec_tool_parser = subparsers.add_parser('exec-tool', help='execute a mirrored tool shim by exact name')
|
||||
exec_tool_parser.add_argument('name')
|
||||
exec_tool_parser.add_argument('payload')
|
||||
# #168: parity with CLI family
|
||||
exec_tool_parser.add_argument('--output-format', choices=['text', 'json'], default='text')
|
||||
return parser
|
||||
|
||||
|
||||
class _ArgparseError(Exception):
|
||||
"""#179: internal exception capturing argparse's real error message.
|
||||
|
||||
Subclassed ArgumentParser raises this instead of printing + exiting,
|
||||
so JSON mode can preserve the actual error (e.g. 'the following arguments
|
||||
are required: session_id') in the envelope.
|
||||
"""
|
||||
def __init__(self, message: str) -> None:
|
||||
super().__init__(message)
|
||||
self.message = message
|
||||
|
||||
|
||||
def _emit_parse_error_envelope(argv: list[str], message: str) -> None:
|
||||
"""#178/#179: emit JSON envelope for argparse-level errors when --output-format json is requested.
|
||||
|
||||
Pre-scans argv for --output-format json. If found, prints a parse-error envelope
|
||||
to stdout (per SCHEMAS.md 'error' envelope shape) instead of letting argparse
|
||||
dump help text to stderr. This preserves the JSON contract for claws that can't
|
||||
parse argparse usage messages.
|
||||
|
||||
#179 update: `message` now carries argparse's actual error text, not a generic
|
||||
rejection string. Stderr is fully suppressed in JSON mode.
|
||||
"""
|
||||
import json
|
||||
# Extract the attempted command (argv[0] is the first positional)
|
||||
attempted = argv[0] if argv and not argv[0].startswith('-') else '<missing>'
|
||||
envelope = wrap_json_envelope(
|
||||
{
|
||||
'error': {
|
||||
'kind': 'parse',
|
||||
'operation': 'argparse',
|
||||
'target': attempted,
|
||||
'retryable': False,
|
||||
'message': message,
|
||||
'hint': 'run with no arguments to see available subcommands',
|
||||
},
|
||||
},
|
||||
command=attempted,
|
||||
exit_code=1,
|
||||
)
|
||||
print(json.dumps(envelope))
|
||||
|
||||
|
||||
def _wants_json_output(argv: list[str]) -> bool:
|
||||
"""#178: check if argv contains --output-format json anywhere (for parse-error routing)."""
|
||||
for i, arg in enumerate(argv):
|
||||
if arg == '--output-format' and i + 1 < len(argv) and argv[i + 1] == 'json':
|
||||
return True
|
||||
if arg == '--output-format=json':
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
def main(argv: list[str] | None = None) -> int:
|
||||
import sys
|
||||
if argv is None:
|
||||
argv = sys.argv[1:]
|
||||
parser = build_parser()
|
||||
args = parser.parse_args(argv)
|
||||
json_mode = _wants_json_output(argv)
|
||||
# #178/#179: capture argparse errors with real message and emit JSON envelope
|
||||
# when --output-format json is requested. In JSON mode, stderr is silenced
|
||||
# so claws only see the envelope on stdout.
|
||||
if json_mode:
|
||||
# Monkey-patch parser.error to raise instead of print+exit. This preserves
|
||||
# the original error message text (e.g. 'argument X: invalid choice: ...').
|
||||
original_error = parser.error
|
||||
def _json_mode_error(message: str) -> None:
|
||||
raise _ArgparseError(message)
|
||||
parser.error = _json_mode_error # type: ignore[method-assign]
|
||||
# Also patch all subparsers
|
||||
for action in parser._actions:
|
||||
if hasattr(action, 'choices') and isinstance(action.choices, dict):
|
||||
for subp in action.choices.values():
|
||||
subp.error = _json_mode_error # type: ignore[method-assign]
|
||||
try:
|
||||
args = parser.parse_args(argv)
|
||||
except _ArgparseError as err:
|
||||
_emit_parse_error_envelope(argv, err.message)
|
||||
return 1
|
||||
except SystemExit as exc:
|
||||
# Defensive: if argparse exits via some other path (e.g. --help in JSON mode)
|
||||
if exc.code != 0:
|
||||
_emit_parse_error_envelope(argv, 'argparse exited with non-zero code')
|
||||
return 1
|
||||
raise
|
||||
else:
|
||||
args = parser.parse_args(argv)
|
||||
manifest = build_port_manifest()
|
||||
if args.command == 'summary':
|
||||
print(QueryEnginePort(manifest).render_summary())
|
||||
@@ -108,13 +306,44 @@ def main(argv: list[str] | None = None) -> int:
|
||||
print(run_setup().as_markdown())
|
||||
return 0
|
||||
if args.command == 'command-graph':
|
||||
print(build_command_graph().as_markdown())
|
||||
graph = build_command_graph()
|
||||
if args.output_format == 'json':
|
||||
import json
|
||||
envelope = {
|
||||
'builtins_count': len(graph.builtins),
|
||||
'plugin_like_count': len(graph.plugin_like),
|
||||
'skill_like_count': len(graph.skill_like),
|
||||
'total_count': len(graph.flattened()),
|
||||
'builtins': [{'name': m.name, 'source_hint': m.source_hint} for m in graph.builtins],
|
||||
'plugin_like': [{'name': m.name, 'source_hint': m.source_hint} for m in graph.plugin_like],
|
||||
'skill_like': [{'name': m.name, 'source_hint': m.source_hint} for m in graph.skill_like],
|
||||
}
|
||||
print(json.dumps(wrap_json_envelope(envelope, args.command)))
|
||||
else:
|
||||
print(graph.as_markdown())
|
||||
return 0
|
||||
if args.command == 'tool-pool':
|
||||
print(assemble_tool_pool().as_markdown())
|
||||
pool = assemble_tool_pool()
|
||||
if args.output_format == 'json':
|
||||
import json
|
||||
envelope = {
|
||||
'simple_mode': pool.simple_mode,
|
||||
'include_mcp': pool.include_mcp,
|
||||
'tool_count': len(pool.tools),
|
||||
'tools': [{'name': t.name, 'source_hint': t.source_hint} for t in pool.tools],
|
||||
}
|
||||
print(json.dumps(wrap_json_envelope(envelope, args.command)))
|
||||
else:
|
||||
print(pool.as_markdown())
|
||||
return 0
|
||||
if args.command == 'bootstrap-graph':
|
||||
print(build_bootstrap_graph().as_markdown())
|
||||
graph = build_bootstrap_graph()
|
||||
if args.output_format == 'json':
|
||||
import json
|
||||
envelope = {'stages': graph.as_markdown().split('\n'), 'note': 'bootstrap-graph is markdown-only in this version'}
|
||||
print(json.dumps(wrap_json_envelope(envelope, args.command)))
|
||||
else:
|
||||
print(graph.as_markdown())
|
||||
return 0
|
||||
if args.command == 'subsystems':
|
||||
for subsystem in manifest.top_level_modules[: args.limit]:
|
||||
@@ -141,6 +370,25 @@ def main(argv: list[str] | None = None) -> int:
|
||||
return 0
|
||||
if args.command == 'route':
|
||||
matches = PortRuntime().route_prompt(args.prompt, limit=args.limit)
|
||||
# #168: JSON envelope for machine parsing
|
||||
if args.output_format == 'json':
|
||||
import json
|
||||
envelope = {
|
||||
'prompt': args.prompt,
|
||||
'limit': args.limit,
|
||||
'match_count': len(matches),
|
||||
'matches': [
|
||||
{
|
||||
'kind': m.kind,
|
||||
'name': m.name,
|
||||
'score': m.score,
|
||||
'source_hint': m.source_hint,
|
||||
}
|
||||
for m in matches
|
||||
],
|
||||
}
|
||||
print(json.dumps(wrap_json_envelope(envelope, args.command)))
|
||||
return 0
|
||||
if not matches:
|
||||
print('No mirrored command/tool matches found.')
|
||||
return 0
|
||||
@@ -148,25 +396,220 @@ def main(argv: list[str] | None = None) -> int:
|
||||
print(f'{match.kind}\t{match.name}\t{match.score}\t{match.source_hint}')
|
||||
return 0
|
||||
if args.command == 'bootstrap':
|
||||
print(PortRuntime().bootstrap_session(args.prompt, limit=args.limit).as_markdown())
|
||||
session = PortRuntime().bootstrap_session(args.prompt, limit=args.limit)
|
||||
# #168: JSON envelope for machine parsing
|
||||
if args.output_format == 'json':
|
||||
import json
|
||||
envelope = {
|
||||
'prompt': session.prompt,
|
||||
'limit': args.limit,
|
||||
'setup': {
|
||||
'python_version': session.setup.python_version,
|
||||
'implementation': session.setup.implementation,
|
||||
'platform_name': session.setup.platform_name,
|
||||
'test_command': session.setup.test_command,
|
||||
},
|
||||
'routed_matches': [
|
||||
{
|
||||
'kind': m.kind,
|
||||
'name': m.name,
|
||||
'score': m.score,
|
||||
'source_hint': m.source_hint,
|
||||
}
|
||||
for m in session.routed_matches
|
||||
],
|
||||
'command_execution_messages': list(session.command_execution_messages),
|
||||
'tool_execution_messages': list(session.tool_execution_messages),
|
||||
'turn': {
|
||||
'prompt': session.turn_result.prompt,
|
||||
'output': session.turn_result.output,
|
||||
'stop_reason': session.turn_result.stop_reason,
|
||||
'cancel_observed': session.turn_result.cancel_observed,
|
||||
},
|
||||
'persisted_session_path': session.persisted_session_path,
|
||||
}
|
||||
print(json.dumps(wrap_json_envelope(envelope, args.command)))
|
||||
return 0
|
||||
print(session.as_markdown())
|
||||
return 0
|
||||
if args.command == 'turn-loop':
|
||||
results = PortRuntime().run_turn_loop(args.prompt, limit=args.limit, max_turns=args.max_turns, structured_output=args.structured_output)
|
||||
results = PortRuntime().run_turn_loop(
|
||||
args.prompt,
|
||||
limit=args.limit,
|
||||
max_turns=args.max_turns,
|
||||
structured_output=args.structured_output,
|
||||
timeout_seconds=args.timeout_seconds,
|
||||
continuation_prompt=args.continuation_prompt,
|
||||
)
|
||||
# Exit 2 when a timeout terminated the loop so claws can distinguish
|
||||
# 'ran to completion' from 'hit wall-clock budget'.
|
||||
loop_exit_code = 2 if results and results[-1].stop_reason == 'timeout' else 0
|
||||
if args.output_format == 'json':
|
||||
# #164 Stage B + #173: JSON envelope with per-turn cancel_observed
|
||||
# Promotes turn-loop from OPT_OUT to CLAWABLE surface.
|
||||
import json
|
||||
envelope = {
|
||||
'prompt': args.prompt,
|
||||
'max_turns': args.max_turns,
|
||||
'turns_completed': len(results),
|
||||
'timeout_seconds': args.timeout_seconds,
|
||||
'continuation_prompt': args.continuation_prompt,
|
||||
'turns': [
|
||||
{
|
||||
'prompt': r.prompt,
|
||||
'output': r.output,
|
||||
'stop_reason': r.stop_reason,
|
||||
'cancel_observed': r.cancel_observed,
|
||||
'matched_commands': list(r.matched_commands),
|
||||
'matched_tools': list(r.matched_tools),
|
||||
}
|
||||
for r in results
|
||||
],
|
||||
'final_stop_reason': results[-1].stop_reason if results else None,
|
||||
'final_cancel_observed': results[-1].cancel_observed if results else False,
|
||||
}
|
||||
print(json.dumps(wrap_json_envelope(envelope, args.command, exit_code=loop_exit_code)))
|
||||
return loop_exit_code
|
||||
for idx, result in enumerate(results, start=1):
|
||||
print(f'## Turn {idx}')
|
||||
print(result.output)
|
||||
print(f'stop_reason={result.stop_reason}')
|
||||
return 0
|
||||
return loop_exit_code
|
||||
if args.command == 'flush-transcript':
|
||||
from pathlib import Path as _Path
|
||||
engine = QueryEnginePort.from_workspace()
|
||||
# #166: allow deterministic session IDs for claw checkpointing/replay.
|
||||
# When unset, the engine's auto-generated UUID is used (backward compat).
|
||||
if args.session_id:
|
||||
engine.session_id = args.session_id
|
||||
engine.submit_message(args.prompt)
|
||||
path = engine.persist_session()
|
||||
print(path)
|
||||
print(f'flushed={engine.transcript_store.flushed}')
|
||||
directory = _Path(args.directory) if args.directory else None
|
||||
path = engine.persist_session(directory)
|
||||
if args.output_format == 'json':
|
||||
import json as _json
|
||||
_env = {
|
||||
'session_id': engine.session_id,
|
||||
'path': path,
|
||||
'flushed': engine.transcript_store.flushed,
|
||||
'messages_count': len(engine.mutable_messages),
|
||||
'input_tokens': engine.total_usage.input_tokens,
|
||||
'output_tokens': engine.total_usage.output_tokens,
|
||||
}
|
||||
print(_json.dumps(wrap_json_envelope(_env, args.command)))
|
||||
else:
|
||||
# #166: legacy text output preserved byte-for-byte for backward compat.
|
||||
print(path)
|
||||
print(f'flushed={engine.transcript_store.flushed}')
|
||||
return 0
|
||||
if args.command == 'load-session':
|
||||
session = load_session(args.session_id)
|
||||
print(f'{session.session_id}\n{len(session.messages)} messages\nin={session.input_tokens} out={session.output_tokens}')
|
||||
from pathlib import Path as _Path
|
||||
directory = _Path(args.directory) if args.directory else None
|
||||
# #165: catch typed SessionNotFoundError + surface a JSON error envelope
|
||||
# matching the delete-session contract shape. No more raw tracebacks.
|
||||
try:
|
||||
session = load_session(args.session_id, directory)
|
||||
except SessionNotFoundError as exc:
|
||||
if args.output_format == 'json':
|
||||
import json as _json
|
||||
resolved_dir = str(directory) if directory else '.port_sessions'
|
||||
_env = {
|
||||
'session_id': args.session_id,
|
||||
'loaded': False,
|
||||
'error': {
|
||||
'kind': 'session_not_found',
|
||||
'message': str(exc),
|
||||
'directory': resolved_dir,
|
||||
'retryable': False,
|
||||
},
|
||||
}
|
||||
print(_json.dumps(wrap_json_envelope(_env, args.command, exit_code=1)))
|
||||
else:
|
||||
print(f'error: {exc}')
|
||||
return 1
|
||||
except (OSError, ValueError) as exc:
|
||||
# Corrupted session file, IO error, JSON decode error — distinct
|
||||
# from 'not found'. Callers may retry here (fs glitch).
|
||||
if args.output_format == 'json':
|
||||
import json as _json
|
||||
resolved_dir = str(directory) if directory else '.port_sessions'
|
||||
_env = {
|
||||
'session_id': args.session_id,
|
||||
'loaded': False,
|
||||
'error': {
|
||||
'kind': 'session_load_failed',
|
||||
'message': str(exc),
|
||||
'directory': resolved_dir,
|
||||
'retryable': True,
|
||||
},
|
||||
}
|
||||
print(_json.dumps(wrap_json_envelope(_env, args.command, exit_code=1)))
|
||||
else:
|
||||
print(f'error: {exc}')
|
||||
return 1
|
||||
if args.output_format == 'json':
|
||||
import json as _json
|
||||
_env = {
|
||||
'session_id': session.session_id,
|
||||
'loaded': True,
|
||||
'messages_count': len(session.messages),
|
||||
'input_tokens': session.input_tokens,
|
||||
'output_tokens': session.output_tokens,
|
||||
}
|
||||
print(_json.dumps(wrap_json_envelope(_env, args.command)))
|
||||
else:
|
||||
print(f'{session.session_id}\n{len(session.messages)} messages\nin={session.input_tokens} out={session.output_tokens}')
|
||||
return 0
|
||||
if args.command == 'list-sessions':
|
||||
from pathlib import Path as _Path
|
||||
directory = _Path(args.directory) if args.directory else None
|
||||
ids = list_sessions(directory)
|
||||
if args.output_format == 'json':
|
||||
import json as _json
|
||||
_env = {'sessions': ids, 'count': len(ids)}
|
||||
print(_json.dumps(wrap_json_envelope(_env, args.command)))
|
||||
else:
|
||||
if not ids:
|
||||
print('(no sessions)')
|
||||
else:
|
||||
for sid in ids:
|
||||
print(sid)
|
||||
return 0
|
||||
if args.command == 'delete-session':
|
||||
from pathlib import Path as _Path
|
||||
directory = _Path(args.directory) if args.directory else None
|
||||
try:
|
||||
deleted = delete_session(args.session_id, directory)
|
||||
except SessionDeleteError as exc:
|
||||
if args.output_format == 'json':
|
||||
import json as _json
|
||||
_env = {
|
||||
'session_id': args.session_id,
|
||||
'deleted': False,
|
||||
'error': {
|
||||
'kind': 'session_delete_failed',
|
||||
'message': str(exc),
|
||||
'retryable': True,
|
||||
},
|
||||
}
|
||||
print(_json.dumps(wrap_json_envelope(_env, args.command, exit_code=1)))
|
||||
else:
|
||||
print(f'error: {exc}')
|
||||
return 1
|
||||
if args.output_format == 'json':
|
||||
import json as _json
|
||||
_env = {
|
||||
'session_id': args.session_id,
|
||||
'deleted': deleted,
|
||||
'status': 'deleted' if deleted else 'not_found',
|
||||
}
|
||||
print(_json.dumps(wrap_json_envelope(_env, args.command)))
|
||||
else:
|
||||
if deleted:
|
||||
print(f'deleted: {args.session_id}')
|
||||
else:
|
||||
print(f'not found: {args.session_id}')
|
||||
# Exit 0 for both cases — delete_session is idempotent,
|
||||
# not-found is success from a cleanup perspective
|
||||
return 0
|
||||
if args.command == 'remote-mode':
|
||||
print(run_remote_mode(args.target).as_text())
|
||||
@@ -186,25 +629,123 @@ def main(argv: list[str] | None = None) -> int:
|
||||
if args.command == 'show-command':
|
||||
module = get_command(args.name)
|
||||
if module is None:
|
||||
print(f'Command not found: {args.name}')
|
||||
if args.output_format == 'json':
|
||||
import json
|
||||
error_envelope = {
|
||||
'name': args.name,
|
||||
'found': False,
|
||||
'error': {
|
||||
'kind': 'command_not_found',
|
||||
'message': f'Unknown command: {args.name}',
|
||||
'retryable': False,
|
||||
},
|
||||
}
|
||||
print(json.dumps(wrap_json_envelope(error_envelope, args.command, exit_code=1)))
|
||||
else:
|
||||
print(f'Command not found: {args.name}')
|
||||
return 1
|
||||
print('\n'.join([module.name, module.source_hint, module.responsibility]))
|
||||
if args.output_format == 'json':
|
||||
import json
|
||||
output = {
|
||||
'name': module.name,
|
||||
'found': True,
|
||||
'source_hint': module.source_hint,
|
||||
'responsibility': module.responsibility,
|
||||
}
|
||||
print(json.dumps(wrap_json_envelope(output, args.command)))
|
||||
else:
|
||||
print('\n'.join([module.name, module.source_hint, module.responsibility]))
|
||||
return 0
|
||||
if args.command == 'show-tool':
|
||||
module = get_tool(args.name)
|
||||
if module is None:
|
||||
print(f'Tool not found: {args.name}')
|
||||
if args.output_format == 'json':
|
||||
import json
|
||||
error_envelope = {
|
||||
'name': args.name,
|
||||
'found': False,
|
||||
'error': {
|
||||
'kind': 'tool_not_found',
|
||||
'message': f'Unknown tool: {args.name}',
|
||||
'retryable': False,
|
||||
},
|
||||
}
|
||||
print(json.dumps(wrap_json_envelope(error_envelope, args.command, exit_code=1)))
|
||||
else:
|
||||
print(f'Tool not found: {args.name}')
|
||||
return 1
|
||||
print('\n'.join([module.name, module.source_hint, module.responsibility]))
|
||||
if args.output_format == 'json':
|
||||
import json
|
||||
output = {
|
||||
'name': module.name,
|
||||
'found': True,
|
||||
'source_hint': module.source_hint,
|
||||
'responsibility': module.responsibility,
|
||||
}
|
||||
print(json.dumps(wrap_json_envelope(output, args.command)))
|
||||
else:
|
||||
print('\n'.join([module.name, module.source_hint, module.responsibility]))
|
||||
return 0
|
||||
if args.command == 'exec-command':
|
||||
result = execute_command(args.name, args.prompt)
|
||||
print(result.message)
|
||||
return 0 if result.handled else 1
|
||||
# #168: JSON envelope with typed not-found error
|
||||
# #181: envelope exit_code must match process exit code
|
||||
exit_code = 0 if result.handled else 1
|
||||
if args.output_format == 'json':
|
||||
import json
|
||||
if not result.handled:
|
||||
envelope = {
|
||||
'name': args.name,
|
||||
'prompt': args.prompt,
|
||||
'handled': False,
|
||||
'error': {
|
||||
'kind': 'command_not_found',
|
||||
'message': result.message,
|
||||
'retryable': False,
|
||||
},
|
||||
}
|
||||
else:
|
||||
envelope = {
|
||||
'name': result.name,
|
||||
'prompt': result.prompt,
|
||||
'source_hint': result.source_hint,
|
||||
'handled': True,
|
||||
'message': result.message,
|
||||
}
|
||||
print(json.dumps(wrap_json_envelope(envelope, args.command, exit_code=exit_code)))
|
||||
else:
|
||||
print(result.message)
|
||||
return exit_code
|
||||
if args.command == 'exec-tool':
|
||||
result = execute_tool(args.name, args.payload)
|
||||
print(result.message)
|
||||
return 0 if result.handled else 1
|
||||
# #168: JSON envelope with typed not-found error
|
||||
# #181: envelope exit_code must match process exit code
|
||||
exit_code = 0 if result.handled else 1
|
||||
if args.output_format == 'json':
|
||||
import json
|
||||
if not result.handled:
|
||||
envelope = {
|
||||
'name': args.name,
|
||||
'payload': args.payload,
|
||||
'handled': False,
|
||||
'error': {
|
||||
'kind': 'tool_not_found',
|
||||
'message': result.message,
|
||||
'retryable': False,
|
||||
},
|
||||
}
|
||||
else:
|
||||
envelope = {
|
||||
'name': result.name,
|
||||
'payload': result.payload,
|
||||
'source_hint': result.source_hint,
|
||||
'handled': True,
|
||||
'message': result.message,
|
||||
}
|
||||
print(json.dumps(wrap_json_envelope(envelope, args.command, exit_code=exit_code)))
|
||||
else:
|
||||
print(result.message)
|
||||
return exit_code
|
||||
parser.error(f'unknown command: {args.command}')
|
||||
return 2
|
||||
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import threading
|
||||
from dataclasses import dataclass, field
|
||||
from uuid import uuid4
|
||||
|
||||
@@ -30,6 +31,7 @@ class TurnResult:
|
||||
permission_denials: tuple[PermissionDenial, ...]
|
||||
usage: UsageSummary
|
||||
stop_reason: str
|
||||
cancel_observed: bool = False
|
||||
|
||||
|
||||
@dataclass
|
||||
@@ -64,7 +66,59 @@ class QueryEnginePort:
|
||||
matched_commands: tuple[str, ...] = (),
|
||||
matched_tools: tuple[str, ...] = (),
|
||||
denied_tools: tuple[PermissionDenial, ...] = (),
|
||||
cancel_event: threading.Event | None = None,
|
||||
) -> TurnResult:
|
||||
"""Submit a prompt and return a TurnResult.
|
||||
|
||||
#164 Stage A: cooperative cancellation via cancel_event.
|
||||
|
||||
The cancel_event argument (added for #164) lets a caller request early
|
||||
termination at a safe point. When set before the pre-mutation commit
|
||||
stage, submit_message returns early with ``stop_reason='cancelled'``
|
||||
and the engine's state (mutable_messages, transcript_store,
|
||||
permission_denials, total_usage) is left **exactly as it was on
|
||||
entry**. This closes the #161 follow-up gap: before this change, a
|
||||
wedged provider thread could finish executing and silently mutate
|
||||
state after the caller had already observed ``stop_reason='timeout'``,
|
||||
giving the session a ghost turn the caller never acknowledged.
|
||||
|
||||
Contract:
|
||||
- cancel_event is None (default) — legacy behaviour, no checks.
|
||||
- cancel_event set **before** budget check — returns 'cancelled'
|
||||
immediately; no output synthesis, no projection, no mutation.
|
||||
- cancel_event set **between** budget check and commit — returns
|
||||
'cancelled' with state intact.
|
||||
- cancel_event set **after** commit — not observable; the turn is
|
||||
already committed and the caller sees 'completed'. Cancellation
|
||||
is a *safe point* mechanism, not preemption. This is the honest
|
||||
limit of cooperative cancellation in Python threading land.
|
||||
|
||||
Stop reason taxonomy after #164 Stage A:
|
||||
- 'completed' — turn committed, state mutated exactly once
|
||||
- 'max_budget_reached' — overflow, state unchanged (#162)
|
||||
- 'max_turns_reached' — capacity exceeded, state unchanged
|
||||
- 'cancelled' — cancel_event observed, state unchanged
|
||||
- 'timeout' — synthesised by runtime, not engine (#161)
|
||||
|
||||
Callers that care about deadline-driven cancellation (run_turn_loop)
|
||||
can now request cleanup by setting the event on timeout — the next
|
||||
submit_message on the same engine will observe it at the start and
|
||||
return 'cancelled' without touching state, even if the previous call
|
||||
is still wedged in provider IO.
|
||||
"""
|
||||
# #164 Stage A: earliest safe cancellation point. No output synthesis,
|
||||
# no budget projection, no mutation — just an immediate clean return.
|
||||
if cancel_event is not None and cancel_event.is_set():
|
||||
return TurnResult(
|
||||
prompt=prompt,
|
||||
output='',
|
||||
matched_commands=matched_commands,
|
||||
matched_tools=matched_tools,
|
||||
permission_denials=denied_tools,
|
||||
usage=self.total_usage, # unchanged
|
||||
stop_reason='cancelled',
|
||||
)
|
||||
|
||||
if len(self.mutable_messages) >= self.config.max_turns:
|
||||
output = f'Max turns reached before processing prompt: {prompt}'
|
||||
return TurnResult(
|
||||
@@ -85,9 +139,40 @@ class QueryEnginePort:
|
||||
]
|
||||
output = self._format_output(summary_lines)
|
||||
projected_usage = self.total_usage.add_turn(prompt, output)
|
||||
stop_reason = 'completed'
|
||||
|
||||
# #162: budget check must precede mutation. Previously this block set
|
||||
# stop_reason='max_budget_reached' but still appended the overflow turn
|
||||
# to mutable_messages / transcript_store / permission_denials, corrupting
|
||||
# the session for any caller that persisted it afterwards. The overflow
|
||||
# prompt was effectively committed even though the TurnResult signalled
|
||||
# rejection. Now we early-return with pre-mutation state intact so
|
||||
# callers can safely retry with a smaller prompt or a fresh budget.
|
||||
if projected_usage.input_tokens + projected_usage.output_tokens > self.config.max_budget_tokens:
|
||||
stop_reason = 'max_budget_reached'
|
||||
return TurnResult(
|
||||
prompt=prompt,
|
||||
output=output,
|
||||
matched_commands=matched_commands,
|
||||
matched_tools=matched_tools,
|
||||
permission_denials=denied_tools,
|
||||
usage=self.total_usage, # unchanged — overflow turn was rejected
|
||||
stop_reason='max_budget_reached',
|
||||
)
|
||||
|
||||
# #164 Stage A: second safe cancellation point. Projection is done
|
||||
# but nothing has been committed yet. If the caller cancelled while
|
||||
# we were building output / computing budget, honour it here — still
|
||||
# no mutation.
|
||||
if cancel_event is not None and cancel_event.is_set():
|
||||
return TurnResult(
|
||||
prompt=prompt,
|
||||
output=output,
|
||||
matched_commands=matched_commands,
|
||||
matched_tools=matched_tools,
|
||||
permission_denials=denied_tools,
|
||||
usage=self.total_usage, # unchanged
|
||||
stop_reason='cancelled',
|
||||
)
|
||||
|
||||
self.mutable_messages.append(prompt)
|
||||
self.transcript_store.append(prompt)
|
||||
self.permission_denials.extend(denied_tools)
|
||||
@@ -100,7 +185,7 @@ class QueryEnginePort:
|
||||
matched_tools=matched_tools,
|
||||
permission_denials=denied_tools,
|
||||
usage=self.total_usage,
|
||||
stop_reason=stop_reason,
|
||||
stop_reason='completed',
|
||||
)
|
||||
|
||||
def stream_submit_message(
|
||||
@@ -137,7 +222,19 @@ class QueryEnginePort:
|
||||
def flush_transcript(self) -> None:
|
||||
self.transcript_store.flush()
|
||||
|
||||
def persist_session(self) -> str:
|
||||
def persist_session(self, directory: 'Path | None' = None) -> str:
|
||||
"""Flush the transcript and save the session to disk.
|
||||
|
||||
Args:
|
||||
directory: Optional override for the storage directory. When None
|
||||
(default, for backward compat), uses the default location
|
||||
(``.port_sessions`` in CWD). When set, passes through to
|
||||
``save_session`` which already supports directory overrides.
|
||||
|
||||
#166: added directory parameter to match the session-lifecycle CLI
|
||||
surface established by #160/#165. Claws running out-of-tree can now
|
||||
redirect session creation to a workspace-specific dir without chdir.
|
||||
"""
|
||||
self.flush_transcript()
|
||||
path = save_session(
|
||||
StoredSession(
|
||||
@@ -145,7 +242,8 @@ class QueryEnginePort:
|
||||
messages=tuple(self.mutable_messages),
|
||||
input_tokens=self.total_usage.input_tokens,
|
||||
output_tokens=self.total_usage.output_tokens,
|
||||
)
|
||||
),
|
||||
directory,
|
||||
)
|
||||
return str(path)
|
||||
|
||||
|
||||
159
src/runtime.py
159
src/runtime.py
@@ -1,11 +1,14 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import threading
|
||||
import time
|
||||
from concurrent.futures import ThreadPoolExecutor, TimeoutError as FuturesTimeoutError
|
||||
from dataclasses import dataclass
|
||||
|
||||
from .commands import PORTED_COMMANDS
|
||||
from .context import PortContext, build_port_context, render_context
|
||||
from .history import HistoryLog
|
||||
from .models import PermissionDenial, PortingModule
|
||||
from .models import PermissionDenial, PortingModule, UsageSummary
|
||||
from .query_engine import QueryEngineConfig, QueryEnginePort, TurnResult
|
||||
from .setup import SetupReport, WorkspaceSetup, run_setup
|
||||
from .system_init import build_system_init_message
|
||||
@@ -151,21 +154,161 @@ class PortRuntime:
|
||||
persisted_session_path=persisted_session_path,
|
||||
)
|
||||
|
||||
def run_turn_loop(self, prompt: str, limit: int = 5, max_turns: int = 3, structured_output: bool = False) -> list[TurnResult]:
|
||||
def run_turn_loop(
|
||||
self,
|
||||
prompt: str,
|
||||
limit: int = 5,
|
||||
max_turns: int = 3,
|
||||
structured_output: bool = False,
|
||||
timeout_seconds: float | None = None,
|
||||
continuation_prompt: str | None = None,
|
||||
) -> list[TurnResult]:
|
||||
"""Run a multi-turn engine loop with optional wall-clock deadline.
|
||||
|
||||
Args:
|
||||
prompt: The initial prompt to submit.
|
||||
limit: Match routing limit.
|
||||
max_turns: Maximum number of turns before stopping.
|
||||
structured_output: Whether to request structured output.
|
||||
timeout_seconds: Total wall-clock budget across all turns. When the
|
||||
budget is exhausted mid-turn, a synthetic TurnResult with
|
||||
``stop_reason='timeout'`` is appended and the loop exits.
|
||||
``None`` (default) preserves legacy unbounded behaviour.
|
||||
continuation_prompt: What to send on turns after the first. When
|
||||
``None`` (default, #163), the loop stops after turn 0 and the
|
||||
caller decides how to continue. When set, the same text is
|
||||
submitted for every turn after the first, giving claws a clean
|
||||
hook for structured follow-ups (e.g. ``"Continue."``, a
|
||||
routing-planner instruction, or a tool-output cue). Previously
|
||||
the loop silently appended ``" [turn N]"`` to the original
|
||||
prompt, polluting the transcript with harness-generated
|
||||
annotation the model had no way to interpret.
|
||||
|
||||
Returns:
|
||||
A list of TurnResult objects. The final entry's ``stop_reason``
|
||||
distinguishes ``'completed'``, ``'max_turns_reached'``,
|
||||
``'max_budget_reached'``, or ``'timeout'``.
|
||||
|
||||
#161: prior to this change a hung ``engine.submit_message`` call would
|
||||
block the loop indefinitely with no cancellation path, forcing claws to
|
||||
rely on external watchdogs or OS-level kills. Callers can now enforce a
|
||||
deadline and receive a typed timeout signal instead.
|
||||
|
||||
#163: the old ``f'{prompt} [turn {turn + 1}]'`` suffix was never
|
||||
interpreted by the engine or any system prompt. It looked like a real
|
||||
user turn in ``mutable_messages`` and the transcript, making replay and
|
||||
analysis fragile. Removed entirely; callers supply ``continuation_prompt``
|
||||
for meaningful follow-ups or let the loop stop after turn 0.
|
||||
"""
|
||||
engine = QueryEnginePort.from_workspace()
|
||||
engine.config = QueryEngineConfig(max_turns=max_turns, structured_output=structured_output)
|
||||
matches = self.route_prompt(prompt, limit=limit)
|
||||
command_names = tuple(match.name for match in matches if match.kind == 'command')
|
||||
tool_names = tuple(match.name for match in matches if match.kind == 'tool')
|
||||
# #159: infer permission denials from the routed matches, not hardcoded empty tuple.
|
||||
# Multi-turn sessions must have the same security posture as bootstrap_session.
|
||||
denied_tools = tuple(self._infer_permission_denials(matches))
|
||||
results: list[TurnResult] = []
|
||||
for turn in range(max_turns):
|
||||
turn_prompt = prompt if turn == 0 else f'{prompt} [turn {turn + 1}]'
|
||||
result = engine.submit_message(turn_prompt, command_names, tool_names, ())
|
||||
results.append(result)
|
||||
if result.stop_reason != 'completed':
|
||||
break
|
||||
deadline = time.monotonic() + timeout_seconds if timeout_seconds is not None else None
|
||||
# #164 Stage A: shared cancel_event signals cooperative cancellation
|
||||
# across turns. On timeout we set() it so any still-running
|
||||
# submit_message call (or the next one on the same engine) observes
|
||||
# the cancel at a safe checkpoint and returns stop_reason='cancelled'
|
||||
# without mutating state. This closes the window where a wedged
|
||||
# provider thread could commit a ghost turn after the caller saw
|
||||
# 'timeout'.
|
||||
cancel_event = threading.Event() if deadline is not None else None
|
||||
|
||||
# ThreadPoolExecutor is reused across turns so we cancel cleanly on exit.
|
||||
executor = ThreadPoolExecutor(max_workers=1) if deadline is not None else None
|
||||
try:
|
||||
for turn in range(max_turns):
|
||||
# #163: no more f'{prompt} [turn N]' suffix injection.
|
||||
# On turn 0 submit the original prompt.
|
||||
# On turn > 0, submit the caller-supplied continuation prompt;
|
||||
# if the caller did not supply one, stop the loop cleanly instead
|
||||
# of fabricating a fake user turn.
|
||||
if turn == 0:
|
||||
turn_prompt = prompt
|
||||
elif continuation_prompt is not None:
|
||||
turn_prompt = continuation_prompt
|
||||
else:
|
||||
break
|
||||
|
||||
if deadline is None:
|
||||
# Legacy path: unbounded call, preserves existing behaviour exactly.
|
||||
# #159: pass inferred denied_tools (no longer hardcoded empty tuple)
|
||||
# #164: cancel_event is None on this path; submit_message skips
|
||||
# cancellation checks entirely (legacy zero-overhead behaviour).
|
||||
result = engine.submit_message(turn_prompt, command_names, tool_names, denied_tools)
|
||||
else:
|
||||
remaining = deadline - time.monotonic()
|
||||
if remaining <= 0:
|
||||
# #164: signal cancel for any in-flight/future submit_message
|
||||
# calls that share this engine. Safe because nothing has been
|
||||
# submitted yet this turn.
|
||||
assert cancel_event is not None
|
||||
cancel_event.set()
|
||||
results.append(self._build_timeout_result(
|
||||
turn_prompt, command_names, tool_names,
|
||||
cancel_observed=cancel_event.is_set()
|
||||
))
|
||||
break
|
||||
assert executor is not None
|
||||
future = executor.submit(
|
||||
engine.submit_message, turn_prompt, command_names, tool_names,
|
||||
denied_tools, cancel_event,
|
||||
)
|
||||
try:
|
||||
result = future.result(timeout=remaining)
|
||||
except FuturesTimeoutError:
|
||||
# #164 Stage A: explicitly signal cancel to the still-running
|
||||
# submit_message thread. The next time it hits a checkpoint
|
||||
# (entry or post-budget), it returns 'cancelled' without
|
||||
# mutating state instead of committing a ghost turn. This
|
||||
# upgrades #161's best-effort future.cancel() (which only
|
||||
# cancels pre-start futures) to cooperative mid-flight cancel.
|
||||
assert cancel_event is not None
|
||||
cancel_event.set()
|
||||
future.cancel()
|
||||
results.append(self._build_timeout_result(
|
||||
turn_prompt, command_names, tool_names,
|
||||
cancel_observed=cancel_event.is_set()
|
||||
))
|
||||
break
|
||||
|
||||
results.append(result)
|
||||
if result.stop_reason != 'completed':
|
||||
break
|
||||
finally:
|
||||
if executor is not None:
|
||||
# wait=False: don't let a hung thread block loop exit indefinitely.
|
||||
# The thread will be reaped when the interpreter shuts down or when
|
||||
# the engine call eventually returns.
|
||||
executor.shutdown(wait=False)
|
||||
return results
|
||||
|
||||
@staticmethod
|
||||
def _build_timeout_result(
|
||||
prompt: str,
|
||||
command_names: tuple[str, ...],
|
||||
tool_names: tuple[str, ...],
|
||||
cancel_observed: bool = False,
|
||||
) -> TurnResult:
|
||||
"""Synthesize a TurnResult representing a wall-clock timeout (#161).
|
||||
#164 Stage B: cancel_observed signals cancellation event was set.
|
||||
"""
|
||||
return TurnResult(
|
||||
prompt=prompt,
|
||||
output='Wall-clock timeout exceeded before turn completed.',
|
||||
matched_commands=command_names,
|
||||
matched_tools=tool_names,
|
||||
permission_denials=(),
|
||||
usage=UsageSummary(),
|
||||
stop_reason='timeout',
|
||||
cancel_observed=cancel_observed,
|
||||
)
|
||||
|
||||
def _infer_permission_denials(self, matches: list[RoutedMatch]) -> list[PermissionDenial]:
|
||||
denials: list[PermissionDenial] = []
|
||||
for match in matches:
|
||||
|
||||
@@ -26,10 +26,96 @@ def save_session(session: StoredSession, directory: Path | None = None) -> Path:
|
||||
|
||||
def load_session(session_id: str, directory: Path | None = None) -> StoredSession:
|
||||
target_dir = directory or DEFAULT_SESSION_DIR
|
||||
data = json.loads((target_dir / f'{session_id}.json').read_text())
|
||||
try:
|
||||
data = json.loads((target_dir / f'{session_id}.json').read_text())
|
||||
except FileNotFoundError:
|
||||
raise SessionNotFoundError(f'session {session_id!r} not found in {target_dir}') from None
|
||||
return StoredSession(
|
||||
session_id=data['session_id'],
|
||||
messages=tuple(data['messages']),
|
||||
input_tokens=data['input_tokens'],
|
||||
output_tokens=data['output_tokens'],
|
||||
)
|
||||
|
||||
|
||||
class SessionNotFoundError(KeyError):
|
||||
"""Raised when a session does not exist in the store."""
|
||||
pass
|
||||
|
||||
|
||||
def list_sessions(directory: Path | None = None) -> list[str]:
|
||||
"""List all stored session IDs in the target directory.
|
||||
|
||||
Args:
|
||||
directory: Target session directory. Defaults to DEFAULT_SESSION_DIR.
|
||||
|
||||
Returns:
|
||||
Sorted list of session IDs (JSON filenames without .json extension).
|
||||
"""
|
||||
target_dir = directory or DEFAULT_SESSION_DIR
|
||||
if not target_dir.exists():
|
||||
return []
|
||||
return sorted(p.stem for p in target_dir.glob('*.json'))
|
||||
|
||||
|
||||
def session_exists(session_id: str, directory: Path | None = None) -> bool:
|
||||
"""Check if a session exists without raising an error.
|
||||
|
||||
Args:
|
||||
session_id: The session ID to check.
|
||||
directory: Target session directory. Defaults to DEFAULT_SESSION_DIR.
|
||||
|
||||
Returns:
|
||||
True if the session file exists, False otherwise.
|
||||
"""
|
||||
target_dir = directory or DEFAULT_SESSION_DIR
|
||||
return (target_dir / f'{session_id}.json').exists()
|
||||
|
||||
|
||||
class SessionDeleteError(OSError):
|
||||
"""Raised when a session file exists but cannot be removed (permission, IO error).
|
||||
|
||||
Distinct from SessionNotFoundError: this means the session was present but
|
||||
deletion failed mid-operation. Callers can retry or escalate.
|
||||
"""
|
||||
pass
|
||||
|
||||
|
||||
def delete_session(session_id: str, directory: Path | None = None) -> bool:
|
||||
"""Delete a session file from the store.
|
||||
|
||||
Contract:
|
||||
- **Idempotent**: `delete_session(x)` followed by `delete_session(x)` is safe.
|
||||
Second call returns False (not found), does not raise.
|
||||
- **Race-safe**: Uses `missing_ok=True` on unlink to avoid TOCTOU between
|
||||
exists-check and unlink. Concurrent deletion by another process is
|
||||
treated as a no-op success (returns False for the losing caller).
|
||||
- **Partial-failure surfaced**: If the file exists but cannot be removed
|
||||
(permission denied, filesystem error, directory instead of file), raises
|
||||
`SessionDeleteError` wrapping the underlying OSError. The session store
|
||||
may be in an inconsistent state; caller should retry or escalate.
|
||||
|
||||
Args:
|
||||
session_id: The session ID to delete.
|
||||
directory: Target session directory. Defaults to DEFAULT_SESSION_DIR.
|
||||
|
||||
Returns:
|
||||
True if this call deleted the session file.
|
||||
False if the session did not exist (either never existed or was already deleted).
|
||||
|
||||
Raises:
|
||||
SessionDeleteError: if the session existed but deletion failed.
|
||||
"""
|
||||
target_dir = directory or DEFAULT_SESSION_DIR
|
||||
path = target_dir / f'{session_id}.json'
|
||||
try:
|
||||
# Python 3.8+: missing_ok=True avoids TOCTOU race
|
||||
path.unlink(missing_ok=False)
|
||||
return True
|
||||
except FileNotFoundError:
|
||||
# Either never existed or was concurrently deleted — both are no-ops
|
||||
return False
|
||||
except (PermissionError, IsADirectoryError, OSError) as exc:
|
||||
raise SessionDeleteError(
|
||||
f'session {session_id!r} exists in {target_dir} but could not be deleted: {exc}'
|
||||
) from exc
|
||||
|
||||
199
tests/test_cancel_observed_field.py
Normal file
199
tests/test_cancel_observed_field.py
Normal file
@@ -0,0 +1,199 @@
|
||||
"""#164 Stage B — cancel_observed field coverage.
|
||||
|
||||
Validates that the TurnResult.cancel_observed field correctly signals
|
||||
whether cancellation was observed during turn execution.
|
||||
|
||||
Test coverage:
|
||||
1. Normal completion: cancel_observed=False (no timeout occurred)
|
||||
2. Timeout with cancel signaled: cancel_observed=True
|
||||
3. bootstrap JSON output exposes the field
|
||||
4. turn-loop JSON output exposes cancel_observed per turn
|
||||
5. Safe-to-reuse: after timeout with cancel_observed=True,
|
||||
engine can accept fresh messages without state corruption
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
from src.query_engine import QueryEnginePort, TurnResult
|
||||
from src.runtime import PortRuntime
|
||||
|
||||
|
||||
CLI = [sys.executable, '-m', 'src.main']
|
||||
REPO_ROOT = Path(__file__).resolve().parent.parent
|
||||
|
||||
|
||||
class TestCancelObservedField:
|
||||
"""TurnResult.cancel_observed correctly signals cancellation observation."""
|
||||
|
||||
def test_default_value_is_false(self) -> None:
|
||||
"""New TurnResult defaults to cancel_observed=False (backward compat)."""
|
||||
from src.models import UsageSummary
|
||||
result = TurnResult(
|
||||
prompt='test',
|
||||
output='ok',
|
||||
matched_commands=(),
|
||||
matched_tools=(),
|
||||
permission_denials=(),
|
||||
usage=UsageSummary(),
|
||||
stop_reason='completed',
|
||||
)
|
||||
assert result.cancel_observed is False
|
||||
|
||||
def test_explicit_true_preserved(self) -> None:
|
||||
"""cancel_observed=True is preserved through construction."""
|
||||
from src.models import UsageSummary
|
||||
result = TurnResult(
|
||||
prompt='test',
|
||||
output='timed out',
|
||||
matched_commands=(),
|
||||
matched_tools=(),
|
||||
permission_denials=(),
|
||||
usage=UsageSummary(),
|
||||
stop_reason='timeout',
|
||||
cancel_observed=True,
|
||||
)
|
||||
assert result.cancel_observed is True
|
||||
|
||||
def test_normal_completion_cancel_observed_false(self) -> None:
|
||||
"""Normal turn completion → cancel_observed=False."""
|
||||
runtime = PortRuntime()
|
||||
results = runtime.run_turn_loop('hello', max_turns=1)
|
||||
assert len(results) >= 1
|
||||
assert results[0].cancel_observed is False
|
||||
|
||||
def test_bootstrap_json_includes_cancel_observed(self) -> None:
|
||||
"""bootstrap JSON envelope includes cancel_observed in turn result."""
|
||||
result = subprocess.run(
|
||||
CLI + ['bootstrap', 'hello', '--output-format', 'json'],
|
||||
cwd=REPO_ROOT,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
assert result.returncode == 0
|
||||
envelope = json.loads(result.stdout)
|
||||
assert 'turn' in envelope
|
||||
assert 'cancel_observed' in envelope['turn'], (
|
||||
f"bootstrap turn must include cancel_observed (SCHEMAS.md contract). "
|
||||
f"Got keys: {list(envelope['turn'].keys())}"
|
||||
)
|
||||
# Normal completion → False
|
||||
assert envelope['turn']['cancel_observed'] is False
|
||||
|
||||
def test_turn_loop_json_per_turn_cancel_observed(self) -> None:
|
||||
"""turn-loop JSON envelope includes cancel_observed per turn (#164 Stage B closure)."""
|
||||
result = subprocess.run(
|
||||
CLI + ['turn-loop', 'hello', '--max-turns', '1', '--output-format', 'json'],
|
||||
cwd=REPO_ROOT,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
assert result.returncode == 0, f"stderr: {result.stderr}"
|
||||
envelope = json.loads(result.stdout)
|
||||
# Common fields from wrap_json_envelope
|
||||
assert envelope['command'] == 'turn-loop'
|
||||
assert envelope['schema_version'] == '1.0'
|
||||
# Turn-loop-specific fields
|
||||
assert 'turns' in envelope
|
||||
assert len(envelope['turns']) >= 1
|
||||
for idx, turn in enumerate(envelope['turns']):
|
||||
assert 'cancel_observed' in turn, (
|
||||
f"Turn {idx} missing cancel_observed: {list(turn.keys())}"
|
||||
)
|
||||
# final_cancel_observed convenience field
|
||||
assert 'final_cancel_observed' in envelope
|
||||
assert isinstance(envelope['final_cancel_observed'], bool)
|
||||
|
||||
|
||||
class TestCancelObservedSafeReuseSemantics:
|
||||
"""After timeout with cancel_observed=True, engine state is safe to reuse."""
|
||||
|
||||
def test_timeout_result_cancel_observed_true_when_signaled(self) -> None:
|
||||
"""#164 Stage B: timeout path passes cancel_event.is_set() to result."""
|
||||
# Force a timeout with max_turns=3 and timeout=0.0001 (instant)
|
||||
runtime = PortRuntime()
|
||||
results = runtime.run_turn_loop(
|
||||
'hello', max_turns=3, timeout_seconds=0.0001,
|
||||
continuation_prompt='keep going',
|
||||
)
|
||||
# Last result should be timeout (pre-start path since timeout is instant)
|
||||
assert results, 'timeout path should still produce a result'
|
||||
last = results[-1]
|
||||
assert last.stop_reason == 'timeout'
|
||||
# cancel_observed=True because the timeout path explicitly sets cancel_event
|
||||
assert last.cancel_observed is True, (
|
||||
f"timeout path must signal cancel_observed=True; got {last.cancel_observed}. "
|
||||
f"stop_reason={last.stop_reason}"
|
||||
)
|
||||
|
||||
def test_engine_messages_not_corrupted_by_timeout(self) -> None:
|
||||
"""After timeout with cancel_observed, engine.mutable_messages is consistent.
|
||||
|
||||
#164 Stage B contract: safe-to-reuse means after a timeout-with-cancel,
|
||||
the engine has not committed a ghost turn and can accept fresh input.
|
||||
"""
|
||||
engine = QueryEnginePort.from_workspace()
|
||||
# Track initial state
|
||||
initial_message_count = len(engine.mutable_messages)
|
||||
|
||||
# Simulate a direct submit_message call with cancellation
|
||||
import threading
|
||||
cancel_event = threading.Event()
|
||||
cancel_event.set() # Pre-set: first checkpoint fires
|
||||
result = engine.submit_message(
|
||||
'test', ('cmd1',), ('tool1',),
|
||||
denied_tools=(), cancel_event=cancel_event,
|
||||
)
|
||||
|
||||
# Cancelled turn should not commit mutation
|
||||
assert result.stop_reason == 'cancelled', (
|
||||
f"expected cancelled; got {result.stop_reason}"
|
||||
)
|
||||
# mutable_messages should not have grown
|
||||
assert len(engine.mutable_messages) == initial_message_count, (
|
||||
f"engine.mutable_messages grew after cancelled turn "
|
||||
f"(was {initial_message_count}, now {len(engine.mutable_messages)})"
|
||||
)
|
||||
|
||||
# Engine should accept a fresh message now
|
||||
fresh = engine.submit_message('fresh prompt', ('cmd1',), ('tool1',))
|
||||
assert fresh.stop_reason in ('completed', 'max_budget_reached'), (
|
||||
f"expected engine reusable; got {fresh.stop_reason}"
|
||||
)
|
||||
|
||||
|
||||
class TestCancelObservedSchemaCompliance:
|
||||
"""SCHEMAS.md contract for cancel_observed field."""
|
||||
|
||||
def test_cancel_observed_is_bool_not_nullable(self) -> None:
|
||||
"""cancel_observed is always bool (never null/missing) per SCHEMAS.md."""
|
||||
result = subprocess.run(
|
||||
CLI + ['bootstrap', 'test', '--output-format', 'json'],
|
||||
cwd=REPO_ROOT,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
envelope = json.loads(result.stdout)
|
||||
cancel_observed = envelope['turn']['cancel_observed']
|
||||
assert isinstance(cancel_observed, bool), (
|
||||
f"cancel_observed must be bool; got {type(cancel_observed)}"
|
||||
)
|
||||
|
||||
def test_turn_loop_envelope_has_final_cancel_observed(self) -> None:
|
||||
"""turn-loop JSON exposes final_cancel_observed convenience field."""
|
||||
result = subprocess.run(
|
||||
CLI + ['turn-loop', 'test', '--max-turns', '1', '--output-format', 'json'],
|
||||
cwd=REPO_ROOT,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
assert result.returncode == 0
|
||||
envelope = json.loads(result.stdout)
|
||||
assert 'final_cancel_observed' in envelope
|
||||
assert isinstance(envelope['final_cancel_observed'], bool)
|
||||
333
tests/test_cli_parity_audit.py
Normal file
333
tests/test_cli_parity_audit.py
Normal file
@@ -0,0 +1,333 @@
|
||||
"""Cross-surface CLI parity audit (ROADMAP #171).
|
||||
|
||||
Prevents future drift of the unified JSON envelope contract across
|
||||
claw-code's CLI surface. Instead of requiring humans to notice when
|
||||
a new command skips --output-format, this test introspects the parser
|
||||
at runtime and verifies every command in the declared clawable-surface
|
||||
list supports --output-format {text,json}.
|
||||
|
||||
When a new clawable-surface command is added:
|
||||
1. Implement --output-format on the subparser (normal feature work).
|
||||
2. Add the command name to CLAWABLE_SURFACES below.
|
||||
3. This test passes automatically.
|
||||
|
||||
When a developer adds a new clawable-surface command but forgets
|
||||
--output-format, the test fails with a concrete message pointing at
|
||||
the missing flag. Claws no longer need to eyeball parity; the contract
|
||||
is enforced at test time.
|
||||
|
||||
Three classes of commands:
|
||||
- CLAWABLE_SURFACES: MUST accept --output-format (inspect/lifecycle/exec/diagnostic)
|
||||
- OPT_OUT_SURFACES: explicitly exempt (simulation/mode commands, human-first diagnostic)
|
||||
- Any command in parser not listed in either: test FAILS with classification request
|
||||
|
||||
This is operationalised parity — a machine-first CLI enforced by a
|
||||
machine-first test.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
|
||||
|
||||
from src.main import build_parser # noqa: E402
|
||||
|
||||
|
||||
# Commands that MUST accept --output-format {text,json}.
|
||||
# These are the machine-first surfaces — session lifecycle, execution,
|
||||
# inspect, diagnostic inventory.
|
||||
CLAWABLE_SURFACES = frozenset({
|
||||
# Session lifecycle (#160, #165, #166)
|
||||
'list-sessions',
|
||||
'delete-session',
|
||||
'load-session',
|
||||
'flush-transcript',
|
||||
# Inspect (#167)
|
||||
'show-command',
|
||||
'show-tool',
|
||||
# Execution/work-verb (#168)
|
||||
'exec-command',
|
||||
'exec-tool',
|
||||
'route',
|
||||
'bootstrap',
|
||||
# Diagnostic inventory (#169, #170)
|
||||
'command-graph',
|
||||
'tool-pool',
|
||||
'bootstrap-graph',
|
||||
# Turn-loop with JSON output (#164 Stage B, #174)
|
||||
'turn-loop',
|
||||
})
|
||||
|
||||
# Commands explicitly exempt from --output-format requirement.
|
||||
# Rationale must be explicit — either the command is human-first
|
||||
# (rich Markdown docs/reports), simulation-only, or has a dedicated
|
||||
# JSON mode flag under a different name.
|
||||
OPT_OUT_SURFACES = frozenset({
|
||||
# Rich-Markdown report commands (planned future: JSON schema)
|
||||
'summary', # full workspace summary (Markdown)
|
||||
'manifest', # workspace manifest (Markdown)
|
||||
'parity-audit', # TypeScript archive comparison (Markdown)
|
||||
'setup-report', # startup/prefetch report (Markdown)
|
||||
# List commands with their own query/filter surface (not JSON yet)
|
||||
'subsystems', # use --limit
|
||||
'commands', # use --query / --limit / --no-plugin-commands
|
||||
'tools', # use --query / --limit / --simple-mode
|
||||
# Simulation/debug surfaces (not claw-orchestrated)
|
||||
'remote-mode',
|
||||
'ssh-mode',
|
||||
'teleport-mode',
|
||||
'direct-connect-mode',
|
||||
'deep-link-mode',
|
||||
})
|
||||
|
||||
|
||||
def _discover_subcommands_and_flags() -> dict[str, frozenset[str]]:
|
||||
"""Introspect the argparse tree to discover every subcommand and its flags.
|
||||
|
||||
Returns:
|
||||
{subcommand_name: frozenset of option strings including --output-format
|
||||
if registered}
|
||||
"""
|
||||
parser = build_parser()
|
||||
subcommand_flags: dict[str, frozenset[str]] = {}
|
||||
for action in parser._actions:
|
||||
if not hasattr(action, 'choices') or not action.choices:
|
||||
continue
|
||||
if action.dest != 'command':
|
||||
continue
|
||||
for name, subp in action.choices.items():
|
||||
flags: set[str] = set()
|
||||
for a in subp._actions:
|
||||
if a.option_strings:
|
||||
flags.update(a.option_strings)
|
||||
subcommand_flags[name] = frozenset(flags)
|
||||
return subcommand_flags
|
||||
|
||||
|
||||
class TestClawableSurfaceParity:
|
||||
"""Every clawable-surface command MUST accept --output-format {text,json}.
|
||||
|
||||
This is the invariant that codifies 'claws can treat the CLI as a
|
||||
unified protocol without special-casing'.
|
||||
"""
|
||||
|
||||
def test_all_clawable_surfaces_accept_output_format(self) -> None:
|
||||
"""All commands in CLAWABLE_SURFACES must have --output-format registered."""
|
||||
subcommand_flags = _discover_subcommands_and_flags()
|
||||
missing = []
|
||||
for cmd in CLAWABLE_SURFACES:
|
||||
if cmd not in subcommand_flags:
|
||||
missing.append(f'{cmd}: not registered in parser')
|
||||
elif '--output-format' not in subcommand_flags[cmd]:
|
||||
missing.append(f'{cmd}: missing --output-format flag')
|
||||
assert not missing, (
|
||||
'Clawable-surface parity violation. Every command in '
|
||||
'CLAWABLE_SURFACES must accept --output-format. Failures:\n'
|
||||
+ '\n'.join(f' - {m}' for m in missing)
|
||||
)
|
||||
|
||||
@pytest.mark.parametrize('cmd_name', sorted(CLAWABLE_SURFACES))
|
||||
def test_clawable_surface_output_format_choices(self, cmd_name: str) -> None:
|
||||
"""Every clawable surface must accept exactly {text, json} choices."""
|
||||
parser = build_parser()
|
||||
for action in parser._actions:
|
||||
if not hasattr(action, 'choices') or not action.choices:
|
||||
continue
|
||||
if action.dest != 'command':
|
||||
continue
|
||||
if cmd_name not in action.choices:
|
||||
continue
|
||||
subp = action.choices[cmd_name]
|
||||
for a in subp._actions:
|
||||
if '--output-format' in a.option_strings:
|
||||
assert a.choices == ['text', 'json'], (
|
||||
f'{cmd_name}: --output-format choices are {a.choices}, '
|
||||
f'expected [text, json]'
|
||||
)
|
||||
assert a.default == 'text', (
|
||||
f'{cmd_name}: --output-format default is {a.default!r}, '
|
||||
f'expected \'text\' for backward compat'
|
||||
)
|
||||
return
|
||||
pytest.fail(f'{cmd_name}: no --output-format flag found')
|
||||
|
||||
|
||||
class TestCommandClassificationCoverage:
|
||||
"""Every registered subcommand must be classified as either CLAWABLE or OPT_OUT.
|
||||
|
||||
If a new command is added to the parser but forgotten in both sets, this
|
||||
test fails loudly — forcing an explicit classification decision.
|
||||
"""
|
||||
|
||||
def test_every_registered_command_is_classified(self) -> None:
|
||||
subcommand_flags = _discover_subcommands_and_flags()
|
||||
all_classified = CLAWABLE_SURFACES | OPT_OUT_SURFACES
|
||||
unclassified = set(subcommand_flags.keys()) - all_classified
|
||||
assert not unclassified, (
|
||||
'Unclassified subcommands detected. Every new command must be '
|
||||
'explicitly added to either CLAWABLE_SURFACES (must accept '
|
||||
'--output-format) or OPT_OUT_SURFACES (explicitly exempt with '
|
||||
'rationale). Unclassified:\n'
|
||||
+ '\n'.join(f' - {cmd}' for cmd in sorted(unclassified))
|
||||
)
|
||||
|
||||
def test_no_command_in_both_sets(self) -> None:
|
||||
"""Sanity: a command cannot be both clawable AND opt-out."""
|
||||
overlap = CLAWABLE_SURFACES & OPT_OUT_SURFACES
|
||||
assert not overlap, (
|
||||
f'Classification conflict: commands appear in both sets: {overlap}'
|
||||
)
|
||||
|
||||
def test_all_classified_commands_actually_exist(self) -> None:
|
||||
"""No typos — every command in our sets must actually be registered."""
|
||||
subcommand_flags = _discover_subcommands_and_flags()
|
||||
ghosts = (CLAWABLE_SURFACES | OPT_OUT_SURFACES) - set(subcommand_flags.keys())
|
||||
assert not ghosts, (
|
||||
f'Phantom commands in classification sets (not in parser): {ghosts}. '
|
||||
'Update CLAWABLE_SURFACES / OPT_OUT_SURFACES if commands were removed.'
|
||||
)
|
||||
|
||||
|
||||
class TestJsonOutputContractEndToEnd:
|
||||
"""Verify the contract AT RUNTIME — not just parser-level, but actual execution.
|
||||
|
||||
Each clawable command must, when invoked with --output-format json,
|
||||
produce parseable JSON on stdout (for success cases).
|
||||
"""
|
||||
|
||||
# Minimal invocation args for each clawable command (to hit success path)
|
||||
RUNTIME_INVOCATIONS = {
|
||||
'list-sessions': [],
|
||||
# delete-session/load-session: skip (need state setup, covered by dedicated tests)
|
||||
'show-command': ['add-dir'],
|
||||
'show-tool': ['BashTool'],
|
||||
'exec-command': ['add-dir', 'hi'],
|
||||
'exec-tool': ['BashTool', '{}'],
|
||||
'route': ['review'],
|
||||
'bootstrap': ['hello'],
|
||||
'command-graph': [],
|
||||
'tool-pool': [],
|
||||
'bootstrap-graph': [],
|
||||
# flush-transcript: skip (creates files, covered by dedicated tests)
|
||||
}
|
||||
|
||||
@pytest.mark.parametrize('cmd_name,cmd_args', sorted(RUNTIME_INVOCATIONS.items()))
|
||||
def test_command_emits_parseable_json(self, cmd_name: str, cmd_args: list[str]) -> None:
|
||||
"""End-to-end: invoking with --output-format json yields valid JSON."""
|
||||
import json
|
||||
result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', cmd_name, *cmd_args, '--output-format', 'json'],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
# Accept exit 0 (success) or 1 (typed not-found) — both must still produce JSON
|
||||
assert result.returncode in (0, 1), (
|
||||
f'{cmd_name}: unexpected exit {result.returncode}\n'
|
||||
f'stderr: {result.stderr}\n'
|
||||
f'stdout: {result.stdout[:200]}'
|
||||
)
|
||||
try:
|
||||
json.loads(result.stdout)
|
||||
except json.JSONDecodeError as e:
|
||||
pytest.fail(
|
||||
f'{cmd_name} {cmd_args} --output-format json did not produce '
|
||||
f'parseable JSON: {e}\nOutput: {result.stdout[:200]}'
|
||||
)
|
||||
|
||||
|
||||
class TestOptOutSurfaceRejection:
|
||||
"""Cycle #30: OPT_OUT surfaces must REJECT --output-format, not silently accept.
|
||||
|
||||
OPT_OUT_AUDIT.md classifies 12 surfaces as intentionally exempt from the
|
||||
JSON envelope contract. This test LOCKS that rejection so accidental
|
||||
drift (e.g., a developer adds --output-format to summary without thinking)
|
||||
doesn't silently promote an OPT_OUT surface to CLAWABLE.
|
||||
|
||||
Relationship to existing tests:
|
||||
- test_clawable_surface_has_output_format: asserts CLAWABLE surfaces accept it
|
||||
- TestOptOutSurfaceRejection: asserts OPT_OUT surfaces REJECT it
|
||||
|
||||
Together, these two test classes form a complete parity check:
|
||||
every surface is either IN or OUT, and both cases are explicitly tested.
|
||||
|
||||
If an OPT_OUT surface is promoted to CLAWABLE intentionally:
|
||||
1. Move it from OPT_OUT_SURFACES to CLAWABLE_SURFACES
|
||||
2. Update OPT_OUT_AUDIT.md with promotion rationale
|
||||
3. Remove from this test's expected rejections
|
||||
4. Both sets of tests continue passing
|
||||
"""
|
||||
|
||||
@pytest.mark.parametrize('cmd_name', sorted(OPT_OUT_SURFACES))
|
||||
def test_opt_out_surface_rejects_output_format(self, cmd_name: str) -> None:
|
||||
"""OPT_OUT surfaces must NOT accept --output-format flag.
|
||||
|
||||
Passing --output-format to an OPT_OUT surface should produce an
|
||||
'unrecognized arguments' error from argparse.
|
||||
"""
|
||||
result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', cmd_name, '--output-format', 'json'],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
# Should fail — argparse exit 2 in text mode, exit 1 in JSON mode
|
||||
# (both modes normalize to "unrecognized arguments" message)
|
||||
assert result.returncode != 0, (
|
||||
f'{cmd_name} unexpectedly accepted --output-format json. '
|
||||
f'If this is intentional (promotion to CLAWABLE), move from '
|
||||
f'OPT_OUT_SURFACES to CLAWABLE_SURFACES and update OPT_OUT_AUDIT.md. '
|
||||
f'Output: {result.stdout[:200]}\nStderr: {result.stderr[:200]}'
|
||||
)
|
||||
# Verify the error is specifically about --output-format
|
||||
error_text = result.stdout + result.stderr
|
||||
assert '--output-format' in error_text or 'unrecognized' in error_text, (
|
||||
f'{cmd_name} failed but error not about --output-format. '
|
||||
f'Something else is broken:\n'
|
||||
f'stdout: {result.stdout[:300]}\nstderr: {result.stderr[:300]}'
|
||||
)
|
||||
|
||||
def test_opt_out_set_matches_audit_document(self) -> None:
|
||||
"""OPT_OUT_SURFACES constant must exactly match OPT_OUT_AUDIT.md listing.
|
||||
|
||||
This test reads OPT_OUT_AUDIT.md and verifies the constant doesn't
|
||||
drift from the documentation.
|
||||
"""
|
||||
audit_path = Path(__file__).resolve().parent.parent / 'OPT_OUT_AUDIT.md'
|
||||
audit_text = audit_path.read_text()
|
||||
|
||||
# Expected 12 surfaces per audit doc
|
||||
expected_surfaces = {
|
||||
# Group A: Rich-Markdown Reports (4)
|
||||
'summary', 'manifest', 'parity-audit', 'setup-report',
|
||||
# Group B: List Commands (3)
|
||||
'subsystems', 'commands', 'tools',
|
||||
# Group C: Simulation/Debug (5)
|
||||
'remote-mode', 'ssh-mode', 'teleport-mode',
|
||||
'direct-connect-mode', 'deep-link-mode',
|
||||
}
|
||||
|
||||
assert OPT_OUT_SURFACES == expected_surfaces, (
|
||||
f'OPT_OUT_SURFACES drift from expected 12 surfaces per audit:\n'
|
||||
f' Expected: {sorted(expected_surfaces)}\n'
|
||||
f' Actual: {sorted(OPT_OUT_SURFACES)}'
|
||||
)
|
||||
|
||||
# Each surface should be mentioned in audit doc
|
||||
missing_from_audit = [s for s in OPT_OUT_SURFACES if s not in audit_text]
|
||||
assert not missing_from_audit, (
|
||||
f'OPT_OUT surfaces not mentioned in OPT_OUT_AUDIT.md: {missing_from_audit}'
|
||||
)
|
||||
|
||||
def test_opt_out_count_matches_declared(self) -> None:
|
||||
"""OPT_OUT_AUDIT.md declares '12 surfaces'. Constant must match."""
|
||||
assert len(OPT_OUT_SURFACES) == 12, (
|
||||
f'OPT_OUT_SURFACES has {len(OPT_OUT_SURFACES)} items, '
|
||||
f'but OPT_OUT_AUDIT.md declares 12 total surfaces. '
|
||||
f'Update either the audit doc or the constant.'
|
||||
)
|
||||
70
tests/test_command_graph_tool_pool_output_format.py
Normal file
70
tests/test_command_graph_tool_pool_output_format.py
Normal file
@@ -0,0 +1,70 @@
|
||||
"""Tests for --output-format on command-graph and tool-pool (ROADMAP #169).
|
||||
|
||||
Diagnostic inventory surfaces now speak the CLI family's JSON contract.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
|
||||
|
||||
|
||||
def _run(args: list[str]) -> subprocess.CompletedProcess:
|
||||
return subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', *args],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
|
||||
|
||||
class TestCommandGraphOutputFormat:
|
||||
def test_command_graph_json(self) -> None:
|
||||
result = _run(['command-graph', '--output-format', 'json'])
|
||||
assert result.returncode == 0, result.stderr
|
||||
|
||||
envelope = json.loads(result.stdout)
|
||||
assert 'builtins_count' in envelope
|
||||
assert 'plugin_like_count' in envelope
|
||||
assert 'skill_like_count' in envelope
|
||||
assert 'total_count' in envelope
|
||||
assert envelope['total_count'] == (
|
||||
envelope['builtins_count'] + envelope['plugin_like_count'] + envelope['skill_like_count']
|
||||
)
|
||||
assert isinstance(envelope['builtins'], list)
|
||||
if envelope['builtins']:
|
||||
assert set(envelope['builtins'][0].keys()) == {'name', 'source_hint'}
|
||||
|
||||
def test_command_graph_text_backward_compat(self) -> None:
|
||||
result = _run(['command-graph'])
|
||||
assert result.returncode == 0
|
||||
assert '# Command Graph' in result.stdout
|
||||
assert 'Builtins:' in result.stdout
|
||||
# Not JSON
|
||||
assert not result.stdout.strip().startswith('{')
|
||||
|
||||
|
||||
class TestToolPoolOutputFormat:
|
||||
def test_tool_pool_json(self) -> None:
|
||||
result = _run(['tool-pool', '--output-format', 'json'])
|
||||
assert result.returncode == 0, result.stderr
|
||||
|
||||
envelope = json.loads(result.stdout)
|
||||
assert 'simple_mode' in envelope
|
||||
assert 'include_mcp' in envelope
|
||||
assert 'tool_count' in envelope
|
||||
assert 'tools' in envelope
|
||||
assert envelope['tool_count'] == len(envelope['tools'])
|
||||
if envelope['tools']:
|
||||
assert set(envelope['tools'][0].keys()) == {'name', 'source_hint'}
|
||||
|
||||
def test_tool_pool_text_backward_compat(self) -> None:
|
||||
result = _run(['tool-pool'])
|
||||
assert result.returncode == 0
|
||||
assert '# Tool Pool' in result.stdout
|
||||
assert 'Simple mode:' in result.stdout
|
||||
assert not result.stdout.strip().startswith('{')
|
||||
242
tests/test_cross_channel_consistency.py
Normal file
242
tests/test_cross_channel_consistency.py
Normal file
@@ -0,0 +1,242 @@
|
||||
"""Cycle #27 cross-channel consistency audit (post-#181).
|
||||
|
||||
After #181 fix (envelope.exit_code must match process exit), this test
|
||||
class systematizes the three-layer protocol invariant framework:
|
||||
|
||||
1. Structural compliance: Does the envelope exist? (#178)
|
||||
2. Quality compliance: Is stderr silent + message truthful? (#179)
|
||||
3. Cross-channel consistency: Do multiple channels agree? (#181 + this)
|
||||
|
||||
This file captures cycle #27's proactive invariant audit proving that
|
||||
envelope fields match their corresponding reality channels:
|
||||
|
||||
- envelope.command ↔ argv dispatch
|
||||
- envelope.output_format ↔ --output-format flag
|
||||
- envelope.timestamp ↔ actual wall clock
|
||||
- envelope.found/handled/deleted ↔ operational truth (no error block mismatch)
|
||||
|
||||
All tests passing = no drift detected.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import subprocess
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
import sys
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
|
||||
|
||||
|
||||
def _run(args: list[str]) -> subprocess.CompletedProcess:
|
||||
"""Run claw-code command and capture output."""
|
||||
return subprocess.run(
|
||||
['python3', '-m', 'src.main'] + args,
|
||||
cwd=Path(__file__).parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
|
||||
|
||||
class TestCrossChannelConsistency:
|
||||
"""Cycle #27: envelope fields must match reality channels.
|
||||
|
||||
These are distinct from structural/quality tests. A command can
|
||||
emit structurally valid JSON with clean stderr but still lie about
|
||||
its own output_format or exit code (as #181 proved).
|
||||
"""
|
||||
|
||||
def test_envelope_command_matches_dispatch(self) -> None:
|
||||
"""Envelope.command must equal the dispatched subcommand."""
|
||||
commands_to_test = [
|
||||
'show-command',
|
||||
'show-tool',
|
||||
'list-sessions',
|
||||
'exec-command',
|
||||
'exec-tool',
|
||||
'delete-session',
|
||||
]
|
||||
failures = []
|
||||
for cmd in commands_to_test:
|
||||
# Dispatch varies by arity
|
||||
if cmd == 'show-command':
|
||||
args = [cmd, 'nonexistent', '--output-format', 'json']
|
||||
elif cmd == 'show-tool':
|
||||
args = [cmd, 'nonexistent', '--output-format', 'json']
|
||||
elif cmd == 'exec-command':
|
||||
args = [cmd, 'unknown', 'test', '--output-format', 'json']
|
||||
elif cmd == 'exec-tool':
|
||||
args = [cmd, 'unknown', '{}', '--output-format', 'json']
|
||||
else:
|
||||
args = [cmd, '--output-format', 'json']
|
||||
|
||||
result = _run(args)
|
||||
try:
|
||||
envelope = json.loads(result.stdout)
|
||||
except json.JSONDecodeError:
|
||||
failures.append(f'{cmd}: JSON parse error')
|
||||
continue
|
||||
|
||||
if envelope.get('command') != cmd:
|
||||
failures.append(
|
||||
f'{cmd}: envelope.command={envelope.get("command")}, '
|
||||
f'expected {cmd}'
|
||||
)
|
||||
assert not failures, (
|
||||
'Envelope.command must match dispatched subcommand:\n' +
|
||||
'\n'.join(failures)
|
||||
)
|
||||
|
||||
def test_envelope_output_format_matches_flag(self) -> None:
|
||||
"""Envelope.output_format must match --output-format flag."""
|
||||
result = _run(['list-sessions', '--output-format', 'json'])
|
||||
envelope = json.loads(result.stdout)
|
||||
assert envelope['output_format'] == 'json', (
|
||||
f'output_format mismatch: flag=json, envelope={envelope["output_format"]}'
|
||||
)
|
||||
|
||||
def test_envelope_timestamp_is_recent(self) -> None:
|
||||
"""Envelope.timestamp must be recent (generated at call time)."""
|
||||
result = _run(['list-sessions', '--output-format', 'json'])
|
||||
envelope = json.loads(result.stdout)
|
||||
ts_str = envelope.get('timestamp')
|
||||
assert ts_str, 'no timestamp field'
|
||||
|
||||
ts = datetime.fromisoformat(ts_str.replace('Z', '+00:00'))
|
||||
now = datetime.now(timezone.utc)
|
||||
delta = abs((now - ts).total_seconds())
|
||||
|
||||
assert delta < 5, f'timestamp off by {delta}s (should be <5s)'
|
||||
|
||||
def test_envelope_exit_code_matches_process_exit(self) -> None:
|
||||
"""Cycle #26/#181: envelope.exit_code == process exit code.
|
||||
|
||||
This is a critical invariant. Claws that trust the envelope
|
||||
field must get the truth, not a lie.
|
||||
"""
|
||||
cases = [
|
||||
(['show-command', 'nonexistent', '--output-format', 'json'], 1),
|
||||
(['show-tool', 'nonexistent', '--output-format', 'json'], 1),
|
||||
(['list-sessions', '--output-format', 'json'], 0),
|
||||
(['delete-session', 'any-id', '--output-format', 'json'], 0),
|
||||
]
|
||||
failures = []
|
||||
for args, expected_exit in cases:
|
||||
result = _run(args)
|
||||
if result.returncode != expected_exit:
|
||||
failures.append(
|
||||
f'{args[0]}: process exit {result.returncode}, '
|
||||
f'expected {expected_exit}'
|
||||
)
|
||||
continue
|
||||
|
||||
envelope = json.loads(result.stdout)
|
||||
if envelope['exit_code'] != result.returncode:
|
||||
failures.append(
|
||||
f'{args[0]}: process exit {result.returncode}, '
|
||||
f'envelope.exit_code {envelope["exit_code"]}'
|
||||
)
|
||||
|
||||
assert not failures, (
|
||||
'Envelope.exit_code must match process exit:\n' +
|
||||
'\n'.join(failures)
|
||||
)
|
||||
|
||||
def test_envelope_boolean_fields_match_error_presence(self) -> None:
|
||||
"""found/handled/deleted fields must correlate with error block.
|
||||
|
||||
- If field is True, no error block should exist
|
||||
- If field is False + operational error, error block must exist
|
||||
- If field is False + idempotent (delete nonexistent), no error block
|
||||
"""
|
||||
cases = [
|
||||
# (args, bool_field, expected_value, expect_error_block)
|
||||
(['show-command', 'nonexistent', '--output-format', 'json'],
|
||||
'found', False, True),
|
||||
(['exec-command', 'unknown', 'test', '--output-format', 'json'],
|
||||
'handled', False, True),
|
||||
(['delete-session', 'any-id', '--output-format', 'json'],
|
||||
'deleted', False, False), # idempotent, no error
|
||||
]
|
||||
failures = []
|
||||
for args, field, expected_val, expect_error in cases:
|
||||
result = _run(args)
|
||||
envelope = json.loads(result.stdout)
|
||||
|
||||
actual_val = envelope.get(field)
|
||||
has_error = 'error' in envelope
|
||||
|
||||
if actual_val != expected_val:
|
||||
failures.append(
|
||||
f'{args[0]}: {field}={actual_val}, expected {expected_val}'
|
||||
)
|
||||
if expect_error and not has_error:
|
||||
failures.append(
|
||||
f'{args[0]}: expected error block, but none present'
|
||||
)
|
||||
elif not expect_error and has_error:
|
||||
failures.append(
|
||||
f'{args[0]}: unexpected error block present'
|
||||
)
|
||||
|
||||
assert not failures, (
|
||||
'Boolean fields must correlate with error block:\n' +
|
||||
'\n'.join(failures)
|
||||
)
|
||||
|
||||
|
||||
class TestTextVsJsonModeDivergence:
|
||||
"""Cycle #29: Document known text-mode vs JSON-mode exit code divergence.
|
||||
|
||||
ERROR_HANDLING.md specifies the exit code contract applies ONLY when
|
||||
--output-format json is set. Text mode follows argparse defaults (e.g.,
|
||||
exit 2 for parse errors) while JSON mode normalizes to the contract
|
||||
(exit 1 for parse errors).
|
||||
|
||||
This test class LOCKS the expected divergence so:
|
||||
1. Documentation stays aligned with implementation
|
||||
2. Future changes to text mode behavior are caught as intentional
|
||||
3. Claws consuming subprocess output can trust the docs
|
||||
"""
|
||||
|
||||
def test_unknown_command_text_mode_exits_2(self) -> None:
|
||||
"""Text mode: argparse default exit 2 for unknown subcommand."""
|
||||
result = _run(['nonexistent-cmd'])
|
||||
assert result.returncode == 2, (
|
||||
f'text mode should exit 2 (argparse default), got {result.returncode}'
|
||||
)
|
||||
|
||||
def test_unknown_command_json_mode_exits_1(self) -> None:
|
||||
"""JSON mode: normalized exit 1 for parse error (#178)."""
|
||||
result = _run(['nonexistent-cmd', '--output-format', 'json'])
|
||||
assert result.returncode == 1, (
|
||||
f'JSON mode should exit 1 (protocol contract), got {result.returncode}'
|
||||
)
|
||||
envelope = json.loads(result.stdout)
|
||||
assert envelope['error']['kind'] == 'parse'
|
||||
|
||||
def test_missing_required_arg_text_mode_exits_2(self) -> None:
|
||||
"""Text mode: argparse default exit 2 for missing required arg."""
|
||||
result = _run(['exec-command']) # missing name + prompt
|
||||
assert result.returncode == 2, (
|
||||
f'text mode should exit 2, got {result.returncode}'
|
||||
)
|
||||
|
||||
def test_missing_required_arg_json_mode_exits_1(self) -> None:
|
||||
"""JSON mode: normalized exit 1 for parse error."""
|
||||
result = _run(['exec-command', '--output-format', 'json'])
|
||||
assert result.returncode == 1, (
|
||||
f'JSON mode should exit 1, got {result.returncode}'
|
||||
)
|
||||
|
||||
def test_success_path_identical_in_both_modes(self) -> None:
|
||||
"""Success exit codes are identical in both modes."""
|
||||
text_result = _run(['list-sessions'])
|
||||
json_result = _run(['list-sessions', '--output-format', 'json'])
|
||||
assert text_result.returncode == json_result.returncode == 0, (
|
||||
f'success exit should be 0 in both modes: '
|
||||
f'text={text_result.returncode}, json={json_result.returncode}'
|
||||
)
|
||||
306
tests/test_exec_route_bootstrap_output_format.py
Normal file
306
tests/test_exec_route_bootstrap_output_format.py
Normal file
@@ -0,0 +1,306 @@
|
||||
"""Tests for --output-format on exec-command/exec-tool/route/bootstrap (ROADMAP #168).
|
||||
|
||||
Closes the final JSON-parity gap across the CLI family. After #160/#165/
|
||||
#166/#167, the session-lifecycle and inspect CLI commands all spoke JSON;
|
||||
this batch extends that contract to the exec, route, and bootstrap
|
||||
surfaces — the commands claws actually invoke to DO work, not just inspect
|
||||
state.
|
||||
|
||||
Verifies:
|
||||
- exec-command / exec-tool: JSON envelope with handled + source_hint on
|
||||
success; {name, handled:false, error:{kind,message,retryable}} on
|
||||
not-found
|
||||
- route: JSON envelope with match_count + matches list
|
||||
- bootstrap: JSON envelope with setup, routed_matches, turn, messages,
|
||||
persisted_session_path
|
||||
- All 4 preserve legacy text mode byte-identically
|
||||
- Exit codes unchanged (0 success, 1 exec-not-found)
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
|
||||
|
||||
|
||||
def _run(args: list[str]) -> subprocess.CompletedProcess:
|
||||
return subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', *args],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
|
||||
|
||||
class TestExecCommandOutputFormat:
|
||||
def test_exec_command_found_json(self) -> None:
|
||||
result = _run(['exec-command', 'add-dir', 'hello', '--output-format', 'json'])
|
||||
assert result.returncode == 0, result.stderr
|
||||
|
||||
envelope = json.loads(result.stdout)
|
||||
assert envelope['handled'] is True
|
||||
assert envelope['name'] == 'add-dir'
|
||||
assert envelope['prompt'] == 'hello'
|
||||
assert 'source_hint' in envelope
|
||||
assert 'message' in envelope
|
||||
assert 'error' not in envelope
|
||||
|
||||
def test_exec_command_not_found_json(self) -> None:
|
||||
result = _run(['exec-command', 'nonexistent-cmd', 'hi', '--output-format', 'json'])
|
||||
assert result.returncode == 1
|
||||
|
||||
envelope = json.loads(result.stdout)
|
||||
assert envelope['handled'] is False
|
||||
assert envelope['name'] == 'nonexistent-cmd'
|
||||
assert envelope['prompt'] == 'hi'
|
||||
assert envelope['error']['kind'] == 'command_not_found'
|
||||
assert envelope['error']['retryable'] is False
|
||||
assert 'source_hint' not in envelope
|
||||
|
||||
def test_exec_command_text_backward_compat(self) -> None:
|
||||
result = _run(['exec-command', 'add-dir', 'hello'])
|
||||
assert result.returncode == 0
|
||||
# Single line prose (unchanged from pre-#168)
|
||||
assert result.stdout.count('\n') == 1
|
||||
assert 'add-dir' in result.stdout
|
||||
|
||||
|
||||
class TestExecToolOutputFormat:
|
||||
def test_exec_tool_found_json(self) -> None:
|
||||
result = _run(['exec-tool', 'BashTool', '{"cmd":"ls"}', '--output-format', 'json'])
|
||||
assert result.returncode == 0, result.stderr
|
||||
|
||||
envelope = json.loads(result.stdout)
|
||||
assert envelope['handled'] is True
|
||||
assert envelope['name'] == 'BashTool'
|
||||
assert envelope['payload'] == '{"cmd":"ls"}'
|
||||
assert 'source_hint' in envelope
|
||||
assert 'error' not in envelope
|
||||
|
||||
def test_exec_tool_not_found_json(self) -> None:
|
||||
result = _run(['exec-tool', 'NotATool', '{}', '--output-format', 'json'])
|
||||
assert result.returncode == 1
|
||||
|
||||
envelope = json.loads(result.stdout)
|
||||
assert envelope['handled'] is False
|
||||
assert envelope['name'] == 'NotATool'
|
||||
assert envelope['error']['kind'] == 'tool_not_found'
|
||||
assert envelope['error']['retryable'] is False
|
||||
|
||||
def test_exec_tool_text_backward_compat(self) -> None:
|
||||
result = _run(['exec-tool', 'BashTool', '{}'])
|
||||
assert result.returncode == 0
|
||||
assert result.stdout.count('\n') == 1
|
||||
|
||||
|
||||
class TestRouteOutputFormat:
|
||||
def test_route_json_envelope(self) -> None:
|
||||
result = _run(['route', 'review mcp', '--limit', '3', '--output-format', 'json'])
|
||||
assert result.returncode == 0
|
||||
|
||||
envelope = json.loads(result.stdout)
|
||||
assert envelope['prompt'] == 'review mcp'
|
||||
assert envelope['limit'] == 3
|
||||
assert 'match_count' in envelope
|
||||
assert 'matches' in envelope
|
||||
assert envelope['match_count'] == len(envelope['matches'])
|
||||
# Every match has required keys
|
||||
for m in envelope['matches']:
|
||||
assert set(m.keys()) == {'kind', 'name', 'score', 'source_hint'}
|
||||
assert m['kind'] in ('command', 'tool')
|
||||
|
||||
def test_route_json_no_matches(self) -> None:
|
||||
# Very unusual string should yield zero matches
|
||||
result = _run(['route', 'zzzzzzzzzqqqqq', '--output-format', 'json'])
|
||||
assert result.returncode == 0
|
||||
|
||||
envelope = json.loads(result.stdout)
|
||||
assert envelope['match_count'] == 0
|
||||
assert envelope['matches'] == []
|
||||
|
||||
def test_route_text_backward_compat(self) -> None:
|
||||
"""Text mode tab-separated output unchanged from pre-#168."""
|
||||
result = _run(['route', 'review mcp', '--limit', '2'])
|
||||
assert result.returncode == 0
|
||||
# Each non-empty line has exactly 3 tabs (kind\tname\tscore\tsource_hint)
|
||||
for line in result.stdout.strip().split('\n'):
|
||||
if line:
|
||||
assert line.count('\t') == 3
|
||||
|
||||
|
||||
class TestBootstrapOutputFormat:
|
||||
def test_bootstrap_json_envelope(self) -> None:
|
||||
result = _run(['bootstrap', 'review MCP', '--limit', '2', '--output-format', 'json'])
|
||||
assert result.returncode == 0, result.stderr
|
||||
|
||||
envelope = json.loads(result.stdout)
|
||||
# Required top-level keys
|
||||
required = {
|
||||
'prompt', 'limit', 'setup', 'routed_matches',
|
||||
'command_execution_messages', 'tool_execution_messages',
|
||||
'turn', 'persisted_session_path',
|
||||
}
|
||||
assert required.issubset(envelope.keys())
|
||||
# Setup sub-envelope
|
||||
assert 'python_version' in envelope['setup']
|
||||
assert 'platform_name' in envelope['setup']
|
||||
# Turn sub-envelope
|
||||
assert 'stop_reason' in envelope['turn']
|
||||
assert 'prompt' in envelope['turn']
|
||||
|
||||
def test_bootstrap_text_is_markdown(self) -> None:
|
||||
"""Text mode produces Markdown (unchanged from pre-#168)."""
|
||||
result = _run(['bootstrap', 'hello', '--limit', '2'])
|
||||
assert result.returncode == 0
|
||||
# Markdown headers
|
||||
assert '# Runtime Session' in result.stdout
|
||||
assert '## Setup' in result.stdout
|
||||
assert '## Routed Matches' in result.stdout
|
||||
|
||||
|
||||
class TestFamilyWideJsonParity:
|
||||
"""After #167 and #168, ALL inspect/exec/route/lifecycle commands
|
||||
support --output-format. Verify the full family is now parity-complete."""
|
||||
|
||||
FAMILY_SURFACES = [
|
||||
# (cmd_args, expected_to_parse_json)
|
||||
(['show-command', 'add-dir'], True),
|
||||
(['show-tool', 'BashTool'], True),
|
||||
(['exec-command', 'add-dir', 'hi'], True),
|
||||
(['exec-tool', 'BashTool', '{}'], True),
|
||||
(['route', 'review'], True),
|
||||
(['bootstrap', 'hello'], True),
|
||||
]
|
||||
|
||||
def test_all_family_commands_accept_output_format_json(self) -> None:
|
||||
"""Every family command accepts --output-format json and emits parseable JSON."""
|
||||
failures = []
|
||||
for args_base, should_parse in self.FAMILY_SURFACES:
|
||||
result = _run([*args_base, '--output-format', 'json'])
|
||||
if result.returncode not in (0, 1):
|
||||
failures.append(f'{args_base}: exit {result.returncode} — {result.stderr}')
|
||||
continue
|
||||
try:
|
||||
json.loads(result.stdout)
|
||||
except json.JSONDecodeError as e:
|
||||
failures.append(f'{args_base}: not parseable JSON ({e}): {result.stdout[:100]}')
|
||||
assert not failures, (
|
||||
'CLI family JSON parity gap:\n' + '\n'.join(failures)
|
||||
)
|
||||
|
||||
def test_all_family_commands_text_mode_unchanged(self) -> None:
|
||||
"""Omitting --output-format defaults to text for every family command."""
|
||||
# Sanity: just verify each runs without error in text mode
|
||||
for args_base, _ in self.FAMILY_SURFACES:
|
||||
result = _run(args_base)
|
||||
assert result.returncode in (0, 1), (
|
||||
f'{args_base} failed in text mode: {result.stderr}'
|
||||
)
|
||||
# Output should not be JSON-shaped (no leading {)
|
||||
assert not result.stdout.strip().startswith('{')
|
||||
|
||||
|
||||
class TestEnvelopeExitCodeMatchesProcessExit:
|
||||
"""#181: Envelope exit_code field must match actual process exit code.
|
||||
|
||||
Regression test for the protocol violation where exec-command/exec-tool
|
||||
not-found cases returned exit code 1 from the process but emitted
|
||||
envelopes with exit_code: 0 (default wrap_json_envelope). Claws reading
|
||||
the envelope would misclassify failures as successes.
|
||||
|
||||
Contract (from ERROR_HANDLING.md):
|
||||
- Exit code 0 = success
|
||||
- Exit code 1 = error/not-found
|
||||
- Envelope MUST reflect process exit
|
||||
"""
|
||||
|
||||
def test_exec_command_not_found_envelope_exit_matches(self) -> None:
|
||||
"""exec-command 'unknown-name' must have exit_code=1 in envelope."""
|
||||
result = _run(['exec-command', 'nonexistent-cmd-name', 'test-prompt', '--output-format', 'json'])
|
||||
assert result.returncode == 1, f'process exit should be 1, got {result.returncode}'
|
||||
envelope = json.loads(result.stdout)
|
||||
assert envelope['exit_code'] == 1, (
|
||||
f'envelope.exit_code mismatch: process=1, envelope={envelope["exit_code"]}'
|
||||
)
|
||||
assert envelope['handled'] is False
|
||||
assert envelope['error']['kind'] == 'command_not_found'
|
||||
|
||||
def test_exec_tool_not_found_envelope_exit_matches(self) -> None:
|
||||
"""exec-tool 'unknown-tool' must have exit_code=1 in envelope."""
|
||||
result = _run(['exec-tool', 'nonexistent-tool-name', '{}', '--output-format', 'json'])
|
||||
assert result.returncode == 1, f'process exit should be 1, got {result.returncode}'
|
||||
envelope = json.loads(result.stdout)
|
||||
assert envelope['exit_code'] == 1, (
|
||||
f'envelope.exit_code mismatch: process=1, envelope={envelope["exit_code"]}'
|
||||
)
|
||||
assert envelope['handled'] is False
|
||||
assert envelope['error']['kind'] == 'tool_not_found'
|
||||
|
||||
def test_all_commands_exit_code_invariant(self) -> None:
|
||||
"""Audit: for every clawable command, envelope.exit_code == process exit.
|
||||
|
||||
This is a stronger invariant than 'emits JSON'. Claws dispatching on
|
||||
the envelope's exit_code field must get the truth, not a lie.
|
||||
"""
|
||||
# Sample cases known to return non-zero
|
||||
cases = [
|
||||
# command, expected_exit, justification
|
||||
(['show-command', 'nonexistent-abc'], 1, 'not-found inventory lookup'),
|
||||
(['show-tool', 'nonexistent-xyz'], 1, 'not-found inventory lookup'),
|
||||
(['exec-command', 'nonexistent-1', 'test'], 1, 'not-found execution'),
|
||||
(['exec-tool', 'nonexistent-2', '{}'], 1, 'not-found execution'),
|
||||
]
|
||||
mismatches = []
|
||||
for args, expected_exit, reason in cases:
|
||||
result = _run([*args, '--output-format', 'json'])
|
||||
if result.returncode != expected_exit:
|
||||
mismatches.append(
|
||||
f'{args}: expected process exit {expected_exit} ({reason}), '
|
||||
f'got {result.returncode}'
|
||||
)
|
||||
continue
|
||||
try:
|
||||
envelope = json.loads(result.stdout)
|
||||
except json.JSONDecodeError as e:
|
||||
mismatches.append(f'{args}: JSON parse failed: {e}')
|
||||
continue
|
||||
if envelope.get('exit_code') != result.returncode:
|
||||
mismatches.append(
|
||||
f'{args}: envelope.exit_code={envelope.get("exit_code")} '
|
||||
f'!= process exit={result.returncode} ({reason})'
|
||||
)
|
||||
assert not mismatches, (
|
||||
'Envelope exit_code must match process exit code:\n' +
|
||||
'\n'.join(mismatches)
|
||||
)
|
||||
|
||||
|
||||
class TestMetadataFlags:
|
||||
"""Cycle #28: --version flag implementation (#180 gap closure)."""
|
||||
|
||||
def test_version_flag_returns_version_text(self) -> None:
|
||||
"""--version returns version string and exits successfully."""
|
||||
result = _run(['--version'])
|
||||
assert result.returncode == 0
|
||||
assert 'claw-code' in result.stdout
|
||||
assert '1.0.0' in result.stdout
|
||||
|
||||
def test_help_flag_returns_help_text(self) -> None:
|
||||
"""--help returns help text and exits successfully."""
|
||||
result = _run(['--help'])
|
||||
assert result.returncode == 0
|
||||
assert 'usage:' in result.stdout
|
||||
assert 'Python porting workspace' in result.stdout
|
||||
|
||||
def test_help_still_works_after_version_added(self) -> None:
|
||||
"""Verify -h and --help both work (no regression)."""
|
||||
result_short = _run(['-h'])
|
||||
result_long = _run(['--help'])
|
||||
assert result_short.returncode == 0
|
||||
assert result_long.returncode == 0
|
||||
assert 'usage:' in result_short.stdout
|
||||
assert 'usage:' in result_long.stdout
|
||||
206
tests/test_flush_transcript_cli.py
Normal file
206
tests/test_flush_transcript_cli.py
Normal file
@@ -0,0 +1,206 @@
|
||||
"""Tests for flush-transcript CLI parity with the #160/#165 lifecycle triplet (ROADMAP #166).
|
||||
|
||||
Verifies that session *creation* now accepts the same flag family as session
|
||||
management (list/delete/load):
|
||||
- --directory DIR (alternate storage location)
|
||||
- --output-format {text,json} (structured output)
|
||||
- --session-id ID (deterministic IDs for claw checkpointing)
|
||||
|
||||
Also verifies backward compat: default text output unchanged byte-for-byte.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
|
||||
|
||||
|
||||
_REPO_ROOT = Path(__file__).resolve().parent.parent
|
||||
|
||||
|
||||
def _run_cli(*args: str) -> subprocess.CompletedProcess[str]:
|
||||
return subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', *args],
|
||||
capture_output=True, text=True, cwd=str(_REPO_ROOT),
|
||||
)
|
||||
|
||||
|
||||
class TestDirectoryFlag:
|
||||
def test_flush_transcript_writes_to_custom_directory(self, tmp_path: Path) -> None:
|
||||
result = _run_cli(
|
||||
'flush-transcript', 'hello world',
|
||||
'--directory', str(tmp_path),
|
||||
)
|
||||
assert result.returncode == 0, result.stderr
|
||||
# Exactly one session file should exist in the directory
|
||||
files = list(tmp_path.glob('*.json'))
|
||||
assert len(files) == 1
|
||||
# And the legacy text output points to that file
|
||||
assert str(files[0]) in result.stdout
|
||||
|
||||
|
||||
class TestSessionIdFlag:
|
||||
def test_explicit_session_id_is_respected(self, tmp_path: Path) -> None:
|
||||
result = _run_cli(
|
||||
'flush-transcript', 'hello',
|
||||
'--directory', str(tmp_path),
|
||||
'--session-id', 'deterministic-id-42',
|
||||
)
|
||||
assert result.returncode == 0, result.stderr
|
||||
expected_path = tmp_path / 'deterministic-id-42.json'
|
||||
assert expected_path.exists(), (
|
||||
f'session file not created at deterministic path: {expected_path}'
|
||||
)
|
||||
# And it should contain the ID we asked for
|
||||
data = json.loads(expected_path.read_text())
|
||||
assert data['session_id'] == 'deterministic-id-42'
|
||||
|
||||
def test_auto_session_id_when_flag_omitted(self, tmp_path: Path) -> None:
|
||||
"""Without --session-id, engine still auto-generates a UUID (backward compat)."""
|
||||
result = _run_cli(
|
||||
'flush-transcript', 'hello',
|
||||
'--directory', str(tmp_path),
|
||||
)
|
||||
assert result.returncode == 0
|
||||
files = list(tmp_path.glob('*.json'))
|
||||
assert len(files) == 1
|
||||
# The filename (minus .json) should be a 32-char hex UUID
|
||||
stem = files[0].stem
|
||||
assert len(stem) == 32
|
||||
assert all(c in '0123456789abcdef' for c in stem)
|
||||
|
||||
|
||||
class TestOutputFormatFlag:
|
||||
def test_json_mode_emits_structured_envelope(self, tmp_path: Path) -> None:
|
||||
result = _run_cli(
|
||||
'flush-transcript', 'hello',
|
||||
'--directory', str(tmp_path),
|
||||
'--session-id', 'beta',
|
||||
'--output-format', 'json',
|
||||
)
|
||||
assert result.returncode == 0
|
||||
data = json.loads(result.stdout)
|
||||
assert data['session_id'] == 'beta'
|
||||
assert data['flushed'] is True
|
||||
assert data['path'].endswith('beta.json')
|
||||
# messages_count and token counts should be present and typed
|
||||
assert isinstance(data['messages_count'], int)
|
||||
assert isinstance(data['input_tokens'], int)
|
||||
assert isinstance(data['output_tokens'], int)
|
||||
|
||||
def test_text_mode_byte_identical_to_pre_166_output(self, tmp_path: Path) -> None:
|
||||
"""Legacy text output must not change — claws may be parsing it."""
|
||||
result = _run_cli(
|
||||
'flush-transcript', 'hello',
|
||||
'--directory', str(tmp_path),
|
||||
)
|
||||
assert result.returncode == 0
|
||||
lines = result.stdout.strip().split('\n')
|
||||
# Line 1: path ending in .json
|
||||
assert lines[0].endswith('.json')
|
||||
# Line 2: exact legacy format
|
||||
assert lines[1] == 'flushed=True'
|
||||
|
||||
|
||||
class TestBackwardCompat:
|
||||
def test_no_flags_default_behaviour(self, tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> None:
|
||||
"""Running with no flags still works (default dir, text mode, auto UUID)."""
|
||||
import os
|
||||
env = os.environ.copy()
|
||||
env['PYTHONPATH'] = str(_REPO_ROOT)
|
||||
result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'flush-transcript', 'hello'],
|
||||
capture_output=True, text=True, cwd=str(tmp_path), env=env,
|
||||
)
|
||||
assert result.returncode == 0, result.stderr
|
||||
# Default dir is `.port_sessions` in CWD
|
||||
sessions_dir = tmp_path / '.port_sessions'
|
||||
assert sessions_dir.exists()
|
||||
assert len(list(sessions_dir.glob('*.json'))) == 1
|
||||
|
||||
|
||||
class TestLifecycleIntegration:
|
||||
"""#166's real value: the triplet + creation command are now a coherent family."""
|
||||
|
||||
def test_create_then_list_then_load_then_delete_roundtrip(
|
||||
self, tmp_path: Path,
|
||||
) -> None:
|
||||
"""End-to-end: flush → list → load → delete, all via the same --directory."""
|
||||
# 1. Create
|
||||
create_result = _run_cli(
|
||||
'flush-transcript', 'roundtrip test',
|
||||
'--directory', str(tmp_path),
|
||||
'--session-id', 'rt-session',
|
||||
'--output-format', 'json',
|
||||
)
|
||||
assert create_result.returncode == 0
|
||||
assert json.loads(create_result.stdout)['session_id'] == 'rt-session'
|
||||
|
||||
# 2. List
|
||||
list_result = _run_cli(
|
||||
'list-sessions',
|
||||
'--directory', str(tmp_path),
|
||||
'--output-format', 'json',
|
||||
)
|
||||
assert list_result.returncode == 0
|
||||
list_data = json.loads(list_result.stdout)
|
||||
assert 'rt-session' in list_data['sessions']
|
||||
|
||||
# 3. Load
|
||||
load_result = _run_cli(
|
||||
'load-session', 'rt-session',
|
||||
'--directory', str(tmp_path),
|
||||
'--output-format', 'json',
|
||||
)
|
||||
assert load_result.returncode == 0
|
||||
assert json.loads(load_result.stdout)['loaded'] is True
|
||||
|
||||
# 4. Delete
|
||||
delete_result = _run_cli(
|
||||
'delete-session', 'rt-session',
|
||||
'--directory', str(tmp_path),
|
||||
'--output-format', 'json',
|
||||
)
|
||||
assert delete_result.returncode == 0
|
||||
|
||||
# 5. Verify gone
|
||||
verify_result = _run_cli(
|
||||
'load-session', 'rt-session',
|
||||
'--directory', str(tmp_path),
|
||||
'--output-format', 'json',
|
||||
)
|
||||
assert verify_result.returncode == 1
|
||||
assert json.loads(verify_result.stdout)['error']['kind'] == 'session_not_found'
|
||||
|
||||
|
||||
class TestFullFamilyParity:
|
||||
"""All four session-lifecycle CLI commands accept the same core flag pair.
|
||||
|
||||
This is the #166 acceptance test: flush-transcript joins the family.
|
||||
"""
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
'command',
|
||||
['list-sessions', 'delete-session', 'load-session', 'flush-transcript'],
|
||||
)
|
||||
def test_all_four_accept_directory_flag(self, command: str) -> None:
|
||||
help_text = _run_cli(command, '--help').stdout
|
||||
assert '--directory' in help_text, (
|
||||
f'{command} missing --directory flag (#166 parity gap)'
|
||||
)
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
'command',
|
||||
['list-sessions', 'delete-session', 'load-session', 'flush-transcript'],
|
||||
)
|
||||
def test_all_four_accept_output_format_flag(self, command: str) -> None:
|
||||
help_text = _run_cli(command, '--help').stdout
|
||||
assert '--output-format' in help_text, (
|
||||
f'{command} missing --output-format flag (#166 parity gap)'
|
||||
)
|
||||
213
tests/test_json_envelope_field_consistency.py
Normal file
213
tests/test_json_envelope_field_consistency.py
Normal file
@@ -0,0 +1,213 @@
|
||||
"""JSON envelope field consistency validation (ROADMAP #173 prep).
|
||||
|
||||
This test suite validates that clawable-surface commands' JSON output
|
||||
follows the contract defined in SCHEMAS.md. Currently, commands emit
|
||||
command-specific envelopes without the canonical common fields
|
||||
(timestamp, command, exit_code, output_format, schema_version).
|
||||
|
||||
This test documents the current gap and validates the consistency
|
||||
of what IS there, providing a baseline for #173 (common field wrapping).
|
||||
|
||||
Phase 1 (this test): Validate consistency within each command's envelope.
|
||||
Phase 2 (future #173): Wrap all 13 commands with canonical common fields.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
import pytest
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
|
||||
|
||||
from src.main import build_parser # noqa: E402
|
||||
|
||||
|
||||
# Expected fields for each clawable command's JSON envelope.
|
||||
# These are the command-specific fields (not including common fields yet).
|
||||
# Entries are (command_name, required_fields, optional_fields).
|
||||
ENVELOPE_CONTRACTS = {
|
||||
'list-sessions': (
|
||||
{'count', 'sessions'},
|
||||
set(),
|
||||
),
|
||||
'delete-session': (
|
||||
{'session_id', 'deleted', 'directory'},
|
||||
set(),
|
||||
),
|
||||
'load-session': (
|
||||
{'session_id', 'loaded', 'directory', 'path'},
|
||||
set(),
|
||||
),
|
||||
'flush-transcript': (
|
||||
{'session_id', 'path', 'flushed', 'messages_count', 'input_tokens', 'output_tokens'},
|
||||
set(),
|
||||
),
|
||||
'show-command': (
|
||||
{'name', 'found', 'source_hint', 'responsibility'},
|
||||
set(),
|
||||
),
|
||||
'show-tool': (
|
||||
{'name', 'found', 'source_hint'},
|
||||
set(),
|
||||
),
|
||||
'exec-command': (
|
||||
{'name', 'prompt', 'handled', 'message', 'source_hint'},
|
||||
set(),
|
||||
),
|
||||
'exec-tool': (
|
||||
{'name', 'payload', 'handled', 'message', 'source_hint'},
|
||||
set(),
|
||||
),
|
||||
'route': (
|
||||
{'prompt', 'limit', 'match_count', 'matches'},
|
||||
set(),
|
||||
),
|
||||
'bootstrap': (
|
||||
{'prompt', 'setup', 'routed_matches', 'turn', 'persisted_session_path'},
|
||||
set(),
|
||||
),
|
||||
'command-graph': (
|
||||
{'builtins_count', 'plugin_like_count', 'skill_like_count', 'total_count', 'builtins', 'plugin_like', 'skill_like'},
|
||||
set(),
|
||||
),
|
||||
'tool-pool': (
|
||||
{'simple_mode', 'include_mcp', 'tool_count', 'tools'},
|
||||
set(),
|
||||
),
|
||||
'bootstrap-graph': (
|
||||
{'stages', 'note'},
|
||||
set(),
|
||||
),
|
||||
}
|
||||
|
||||
|
||||
class TestJsonEnvelopeConsistency:
|
||||
"""Validate current command envelopes match their declared contracts.
|
||||
|
||||
This is a consistency check, not a conformance check. Once #173 adds
|
||||
common fields to all commands, these tests will auto-pass the common
|
||||
field assertions and verify command-specific fields stay consistent.
|
||||
"""
|
||||
|
||||
@pytest.mark.parametrize('cmd_name,contract', sorted(ENVELOPE_CONTRACTS.items()))
|
||||
def test_command_json_fields_present(self, cmd_name: str, contract: tuple[set[str], set[str]]) -> None:
|
||||
required, optional = contract
|
||||
"""Command's JSON envelope must include all required fields."""
|
||||
# Get minimal invocation args for this command
|
||||
test_invocations = {
|
||||
'list-sessions': [],
|
||||
'show-command': ['add-dir'],
|
||||
'show-tool': ['BashTool'],
|
||||
'exec-command': ['add-dir', 'hi'],
|
||||
'exec-tool': ['BashTool', '{}'],
|
||||
'route': ['review'],
|
||||
'bootstrap': ['hello'],
|
||||
'command-graph': [],
|
||||
'tool-pool': [],
|
||||
'bootstrap-graph': [],
|
||||
}
|
||||
|
||||
if cmd_name not in test_invocations:
|
||||
pytest.skip(f'{cmd_name} requires session setup; skipped')
|
||||
|
||||
cmd_args = test_invocations[cmd_name]
|
||||
result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', cmd_name, *cmd_args, '--output-format', 'json'],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
|
||||
if result.returncode not in (0, 1):
|
||||
pytest.fail(f'{cmd_name}: unexpected exit {result.returncode}\nstderr: {result.stderr}')
|
||||
|
||||
try:
|
||||
envelope = json.loads(result.stdout)
|
||||
except json.JSONDecodeError as e:
|
||||
pytest.fail(f'{cmd_name}: invalid JSON: {e}\nOutput: {result.stdout[:200]}')
|
||||
|
||||
# Check required fields (command-specific)
|
||||
missing = required - set(envelope.keys())
|
||||
if missing:
|
||||
pytest.fail(
|
||||
f'{cmd_name} envelope missing required fields: {missing}\n'
|
||||
f'Expected: {required}\nGot: {set(envelope.keys())}'
|
||||
)
|
||||
|
||||
# Check that extra fields are accounted for (warn if unknown)
|
||||
known = required | optional
|
||||
extra = set(envelope.keys()) - known
|
||||
if extra:
|
||||
# Warn but don't fail — there may be new fields added
|
||||
pytest.warns(UserWarning, match=f'extra fields in {cmd_name}: {extra}')
|
||||
|
||||
def test_envelope_field_value_types(self) -> None:
|
||||
"""Smoke test: envelope fields have expected types (bool, int, str, list, dict, null)."""
|
||||
result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'list-sessions', '--output-format', 'json'],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
|
||||
envelope = json.loads(result.stdout)
|
||||
|
||||
# Spot check a few fields
|
||||
assert isinstance(envelope.get('count'), int), 'count should be int'
|
||||
assert isinstance(envelope.get('sessions'), list), 'sessions should be list'
|
||||
|
||||
|
||||
class TestJsonEnvelopeCommonFieldPrep:
|
||||
"""Validation stubs for common fields (part of #173 implementation).
|
||||
|
||||
These tests will activate once wrap_json_envelope() is applied to all
|
||||
13 clawable commands. Currently they document the expected contract.
|
||||
"""
|
||||
|
||||
def test_all_envelopes_include_timestamp(self) -> None:
|
||||
"""Every clawable envelope must include ISO 8601 UTC timestamp."""
|
||||
result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'command-graph', '--output-format', 'json'],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
envelope = json.loads(result.stdout)
|
||||
assert 'timestamp' in envelope, 'Missing timestamp field'
|
||||
# Verify ISO 8601 format (ends with Z for UTC)
|
||||
assert envelope['timestamp'].endswith('Z'), f'Timestamp not UTC: {envelope["timestamp"]}'
|
||||
|
||||
def test_all_envelopes_include_command(self) -> None:
|
||||
"""Every envelope must echo the command name."""
|
||||
test_cases = [
|
||||
('list-sessions', []),
|
||||
('command-graph', []),
|
||||
('bootstrap', ['hello']),
|
||||
]
|
||||
for cmd_name, cmd_args in test_cases:
|
||||
result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', cmd_name, *cmd_args, '--output-format', 'json'],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
envelope = json.loads(result.stdout)
|
||||
assert envelope.get('command') == cmd_name, f'{cmd_name} envelope.command mismatch'
|
||||
|
||||
def test_all_envelopes_include_exit_code_and_schema_version(self) -> None:
|
||||
"""Every envelope must include exit_code and schema_version."""
|
||||
result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'tool-pool', '--output-format', 'json'],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
envelope = json.loads(result.stdout)
|
||||
assert 'exit_code' in envelope, 'Missing exit_code'
|
||||
assert 'schema_version' in envelope, 'Missing schema_version'
|
||||
assert envelope['schema_version'] == '1.0', 'Wrong schema_version'
|
||||
183
tests/test_load_session_cli.py
Normal file
183
tests/test_load_session_cli.py
Normal file
@@ -0,0 +1,183 @@
|
||||
"""Tests for load-session CLI parity with list-sessions/delete-session (ROADMAP #165).
|
||||
|
||||
Verifies the session-lifecycle CLI triplet is now symmetric:
|
||||
- --directory DIR accepted (alternate storage locations reachable)
|
||||
- --output-format {text,json} accepted
|
||||
- Not-found emits typed JSON error envelope, never a Python traceback
|
||||
- Corrupted session file distinguished from not-found via 'kind'
|
||||
- Legacy text-mode output unchanged (backward compat)
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
|
||||
|
||||
from src.session_store import StoredSession, save_session # noqa: E402
|
||||
|
||||
|
||||
_REPO_ROOT = Path(__file__).resolve().parent.parent
|
||||
|
||||
|
||||
def _run_cli(
|
||||
*args: str, cwd: Path | None = None,
|
||||
) -> subprocess.CompletedProcess[str]:
|
||||
"""Always invoke the CLI with cwd=repo-root so ``python -m src.main``
|
||||
can resolve the ``src`` package, regardless of where the test's
|
||||
tmp_path is.
|
||||
"""
|
||||
return subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', *args],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
cwd=str(cwd) if cwd else str(_REPO_ROOT),
|
||||
)
|
||||
|
||||
|
||||
def _make_session(session_id: str) -> StoredSession:
|
||||
return StoredSession(
|
||||
session_id=session_id, messages=('hi',), input_tokens=1, output_tokens=2,
|
||||
)
|
||||
|
||||
|
||||
class TestDirectoryFlagParity:
|
||||
def test_load_session_accepts_directory_flag(self, tmp_path: Path) -> None:
|
||||
save_session(_make_session('alpha'), tmp_path)
|
||||
result = _run_cli('load-session', 'alpha', '--directory', str(tmp_path))
|
||||
assert result.returncode == 0, result.stderr
|
||||
assert 'alpha' in result.stdout
|
||||
|
||||
def test_load_session_without_directory_uses_cwd_default(
|
||||
self, tmp_path: Path,
|
||||
) -> None:
|
||||
"""When --directory is omitted, fall back to .port_sessions in CWD.
|
||||
|
||||
Subprocess CWD must still be able to import ``src.main``, so we use
|
||||
``cwd=tmp_path`` which means ``python -m src.main`` needs ``src/`` on
|
||||
sys.path. We set PYTHONPATH to the repo root via env.
|
||||
"""
|
||||
sessions_dir = tmp_path / '.port_sessions'
|
||||
sessions_dir.mkdir()
|
||||
save_session(_make_session('beta'), sessions_dir)
|
||||
import os
|
||||
env = os.environ.copy()
|
||||
env['PYTHONPATH'] = str(_REPO_ROOT)
|
||||
result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'load-session', 'beta'],
|
||||
capture_output=True, text=True, cwd=str(tmp_path), env=env,
|
||||
)
|
||||
assert result.returncode == 0, result.stderr
|
||||
assert 'beta' in result.stdout
|
||||
|
||||
|
||||
class TestOutputFormatFlagParity:
|
||||
def test_json_mode_on_success(self, tmp_path: Path) -> None:
|
||||
save_session(
|
||||
StoredSession(
|
||||
session_id='gamma', messages=('x', 'y'),
|
||||
input_tokens=5, output_tokens=7,
|
||||
),
|
||||
tmp_path,
|
||||
)
|
||||
result = _run_cli(
|
||||
'load-session', 'gamma',
|
||||
'--directory', str(tmp_path),
|
||||
'--output-format', 'json',
|
||||
)
|
||||
assert result.returncode == 0
|
||||
data = json.loads(result.stdout)
|
||||
# Verify common envelope fields (SCHEMAS.md contract)
|
||||
assert 'timestamp' in data
|
||||
assert data['command'] == 'load-session'
|
||||
assert data['exit_code'] == 0
|
||||
assert data['schema_version'] == '1.0'
|
||||
# Verify command-specific fields
|
||||
assert data['session_id'] == 'gamma'
|
||||
assert data['loaded'] is True
|
||||
assert data['messages_count'] == 2
|
||||
assert data['input_tokens'] == 5
|
||||
assert data['output_tokens'] == 7
|
||||
|
||||
def test_text_mode_unchanged_on_success(self, tmp_path: Path) -> None:
|
||||
"""Legacy text output must be byte-identical for backward compat."""
|
||||
save_session(_make_session('delta'), tmp_path)
|
||||
result = _run_cli('load-session', 'delta', '--directory', str(tmp_path))
|
||||
assert result.returncode == 0
|
||||
lines = result.stdout.strip().split('\n')
|
||||
assert lines == ['delta', '1 messages', 'in=1 out=2']
|
||||
|
||||
|
||||
class TestNotFoundTypedError:
|
||||
def test_not_found_json_envelope(self, tmp_path: Path) -> None:
|
||||
"""Not-found emits structured JSON, never a Python traceback."""
|
||||
result = _run_cli(
|
||||
'load-session', 'missing',
|
||||
'--directory', str(tmp_path),
|
||||
'--output-format', 'json',
|
||||
)
|
||||
assert result.returncode == 1
|
||||
assert 'Traceback' not in result.stderr, (
|
||||
'regression #165: raw traceback leaked to stderr'
|
||||
)
|
||||
assert 'SessionNotFoundError' not in result.stdout, (
|
||||
'regression #165: internal class name leaked into CLI output'
|
||||
)
|
||||
data = json.loads(result.stdout)
|
||||
assert data['session_id'] == 'missing'
|
||||
assert data['loaded'] is False
|
||||
assert data['error']['kind'] == 'session_not_found'
|
||||
assert data['error']['retryable'] is False
|
||||
# directory field is populated so claws know where we looked
|
||||
assert 'directory' in data['error']
|
||||
|
||||
def test_not_found_text_mode_no_traceback(self, tmp_path: Path) -> None:
|
||||
"""Text mode on not-found must not dump a Python stack either."""
|
||||
result = _run_cli(
|
||||
'load-session', 'missing', '--directory', str(tmp_path),
|
||||
)
|
||||
assert result.returncode == 1
|
||||
assert 'Traceback' not in result.stderr
|
||||
assert result.stdout.startswith('error:')
|
||||
|
||||
|
||||
class TestLoadFailedDistinctFromNotFound:
|
||||
def test_corrupted_session_file_surfaces_distinct_kind(
|
||||
self, tmp_path: Path,
|
||||
) -> None:
|
||||
"""A corrupted JSON file must emit kind='session_load_failed', not 'session_not_found'."""
|
||||
(tmp_path / 'broken.json').write_text('{ not valid json')
|
||||
result = _run_cli(
|
||||
'load-session', 'broken',
|
||||
'--directory', str(tmp_path),
|
||||
'--output-format', 'json',
|
||||
)
|
||||
assert result.returncode == 1
|
||||
data = json.loads(result.stdout)
|
||||
assert data['error']['kind'] == 'session_load_failed'
|
||||
assert data['error']['retryable'] is True, (
|
||||
'corrupted file is potentially retryable (fs glitch) unlike not-found'
|
||||
)
|
||||
|
||||
|
||||
class TestTripletParityConsistency:
|
||||
"""All three #160 CLI commands should accept the same flag pair."""
|
||||
|
||||
@pytest.mark.parametrize('command', ['list-sessions', 'delete-session', 'load-session'])
|
||||
def test_all_three_accept_directory_flag(self, command: str) -> None:
|
||||
help_text = _run_cli(command, '--help').stdout
|
||||
assert '--directory' in help_text, (
|
||||
f'{command} missing --directory flag (#165 parity gap)'
|
||||
)
|
||||
|
||||
@pytest.mark.parametrize('command', ['list-sessions', 'delete-session', 'load-session'])
|
||||
def test_all_three_accept_output_format_flag(self, command: str) -> None:
|
||||
help_text = _run_cli(command, '--help').stdout
|
||||
assert '--output-format' in help_text, (
|
||||
f'{command} missing --output-format flag (#165 parity gap)'
|
||||
)
|
||||
239
tests/test_parse_error_envelope.py
Normal file
239
tests/test_parse_error_envelope.py
Normal file
@@ -0,0 +1,239 @@
|
||||
"""#178 — argparse-level errors emit JSON envelope when --output-format json is requested.
|
||||
|
||||
Before #178:
|
||||
$ claw nonexistent --output-format json
|
||||
usage: main.py [-h] {summary,manifest,...} ...
|
||||
main.py: error: argument command: invalid choice: 'nonexistent' (choose from ...)
|
||||
[exit 2, argparse dumps help to stderr, no JSON envelope]
|
||||
|
||||
After #178:
|
||||
$ claw nonexistent --output-format json
|
||||
{"timestamp": "...", "command": "nonexistent", "exit_code": 1, ...,
|
||||
"error": {"kind": "parse", "operation": "argparse", ...}}
|
||||
[exit 1, JSON envelope on stdout, matches SCHEMAS.md contract]
|
||||
|
||||
Contract:
|
||||
- text mode: unchanged (argparse still dumps help to stderr, exit code 2)
|
||||
- JSON mode: envelope matches SCHEMAS.md 'error' shape, exit code 1
|
||||
- Parse errors use error.kind='parse' (distinct from runtime/session/etc.)
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
CLI = [sys.executable, '-m', 'src.main']
|
||||
REPO_ROOT = Path(__file__).resolve().parent.parent
|
||||
|
||||
|
||||
class TestParseErrorJsonEnvelope:
|
||||
"""Argparse errors emit JSON envelope when --output-format json is requested."""
|
||||
|
||||
def test_unknown_command_json_mode_emits_envelope(self) -> None:
|
||||
"""Unknown command + --output-format json → parse-error envelope."""
|
||||
result = subprocess.run(
|
||||
CLI + ['nonexistent-command', '--output-format', 'json'],
|
||||
cwd=REPO_ROOT,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
assert result.returncode == 1, f"expected exit 1; got {result.returncode}"
|
||||
envelope = json.loads(result.stdout)
|
||||
# Common fields
|
||||
assert envelope['schema_version'] == '1.0'
|
||||
assert envelope['output_format'] == 'json'
|
||||
assert envelope['exit_code'] == 1
|
||||
# Error envelope shape
|
||||
assert envelope['error']['kind'] == 'parse'
|
||||
assert envelope['error']['operation'] == 'argparse'
|
||||
assert envelope['error']['retryable'] is False
|
||||
assert envelope['error']['target'] == 'nonexistent-command'
|
||||
assert 'hint' in envelope['error']
|
||||
|
||||
def test_unknown_command_json_equals_syntax(self) -> None:
|
||||
"""--output-format=json syntax also works."""
|
||||
result = subprocess.run(
|
||||
CLI + ['nonexistent-command', '--output-format=json'],
|
||||
cwd=REPO_ROOT,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
assert result.returncode == 1
|
||||
envelope = json.loads(result.stdout)
|
||||
assert envelope['error']['kind'] == 'parse'
|
||||
|
||||
def test_unknown_command_text_mode_unchanged(self) -> None:
|
||||
"""Text mode (default) preserves argparse behavior: help to stderr, exit 2."""
|
||||
result = subprocess.run(
|
||||
CLI + ['nonexistent-command'],
|
||||
cwd=REPO_ROOT,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
assert result.returncode == 2, f"text mode must preserve argparse exit 2; got {result.returncode}"
|
||||
# stderr should have argparse error (help + error message)
|
||||
assert 'invalid choice' in result.stderr
|
||||
# stdout should be empty (no JSON leaked)
|
||||
assert result.stdout == ''
|
||||
|
||||
def test_invalid_flag_json_mode_emits_envelope(self) -> None:
|
||||
"""Invalid flag at top level + --output-format json → envelope."""
|
||||
result = subprocess.run(
|
||||
CLI + ['--invalid-top-level-flag', '--output-format', 'json'],
|
||||
cwd=REPO_ROOT,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
# argparse might reject before --output-format is parsed; still emit envelope
|
||||
assert result.returncode == 1, f"got {result.returncode}: {result.stderr}"
|
||||
envelope = json.loads(result.stdout)
|
||||
assert envelope['error']['kind'] == 'parse'
|
||||
|
||||
def test_missing_command_no_json_flag_behaves_normally(self) -> None:
|
||||
"""No --output-format flag + missing command → normal argparse behavior."""
|
||||
result = subprocess.run(
|
||||
CLI,
|
||||
cwd=REPO_ROOT,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
# argparse exits 2 when required subcommand is missing
|
||||
assert result.returncode == 2
|
||||
assert 'required' in result.stderr.lower() or 'the following arguments are required' in result.stderr.lower()
|
||||
|
||||
def test_valid_command_unaffected(self) -> None:
|
||||
"""Valid commands still work normally (no regression)."""
|
||||
result = subprocess.run(
|
||||
CLI + ['list-sessions', '--output-format', 'json'],
|
||||
cwd=REPO_ROOT,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
assert result.returncode == 0
|
||||
envelope = json.loads(result.stdout)
|
||||
assert envelope['command'] == 'list-sessions'
|
||||
assert 'sessions' in envelope
|
||||
|
||||
def test_parse_error_envelope_contains_common_fields(self) -> None:
|
||||
"""Parse-error envelope must include all common fields per SCHEMAS.md."""
|
||||
result = subprocess.run(
|
||||
CLI + ['bogus', '--output-format', 'json'],
|
||||
cwd=REPO_ROOT,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
envelope = json.loads(result.stdout)
|
||||
# All common fields required by SCHEMAS.md
|
||||
for field in ('timestamp', 'command', 'exit_code', 'output_format', 'schema_version'):
|
||||
assert field in envelope, f"common field '{field}' missing from parse-error envelope"
|
||||
|
||||
|
||||
class TestParseErrorSchemaCompliance:
|
||||
"""Parse-error envelope matches SCHEMAS.md error shape."""
|
||||
|
||||
def test_error_kind_is_parse(self) -> None:
|
||||
"""error.kind='parse' distinguishes argparse errors from runtime errors."""
|
||||
result = subprocess.run(
|
||||
CLI + ['unknown', '--output-format', 'json'],
|
||||
cwd=REPO_ROOT,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
envelope = json.loads(result.stdout)
|
||||
assert envelope['error']['kind'] == 'parse'
|
||||
|
||||
def test_error_retryable_false(self) -> None:
|
||||
"""Parse errors are never retryable (typo won't magically fix itself)."""
|
||||
result = subprocess.run(
|
||||
CLI + ['unknown', '--output-format', 'json'],
|
||||
cwd=REPO_ROOT,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
envelope = json.loads(result.stdout)
|
||||
assert envelope['error']['retryable'] is False
|
||||
|
||||
|
||||
class TestParseErrorStderrHygiene:
|
||||
"""#179: JSON mode must fully suppress argparse stderr output.
|
||||
|
||||
Before #179: stderr leaked argparse usage + error text even when --output-format json.
|
||||
After #179: stderr is silent; envelope carries the real error message verbatim.
|
||||
"""
|
||||
|
||||
def test_json_mode_stderr_is_silent_on_unknown_command(self) -> None:
|
||||
"""Unknown command in JSON mode: stderr empty."""
|
||||
result = subprocess.run(
|
||||
CLI + ['nonexistent-cmd', '--output-format', 'json'],
|
||||
cwd=REPO_ROOT,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
assert result.stderr == '', (
|
||||
f"JSON mode stderr must be empty; got:\n{result.stderr!r}"
|
||||
)
|
||||
|
||||
def test_json_mode_stderr_is_silent_on_missing_arg(self) -> None:
|
||||
"""Missing required arg in JSON mode: stderr empty (no argparse usage leak)."""
|
||||
result = subprocess.run(
|
||||
CLI + ['load-session', '--output-format', 'json'],
|
||||
cwd=REPO_ROOT,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
assert result.stderr == '', (
|
||||
f"JSON mode stderr must be empty on missing arg; got:\n{result.stderr!r}"
|
||||
)
|
||||
|
||||
def test_json_mode_envelope_carries_real_argparse_message(self) -> None:
|
||||
"""#179: envelope.error.message contains argparse's actual text, not generic rejection."""
|
||||
result = subprocess.run(
|
||||
CLI + ['load-session', '--output-format', 'json'],
|
||||
cwd=REPO_ROOT,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
envelope = json.loads(result.stdout)
|
||||
# Real argparse message: 'the following arguments are required: session_id'
|
||||
msg = envelope['error']['message']
|
||||
assert 'session_id' in msg, (
|
||||
f"envelope.error.message must carry real argparse text mentioning missing arg; got: {msg!r}"
|
||||
)
|
||||
assert 'required' in msg.lower(), (
|
||||
f"envelope.error.message must indicate what is required; got: {msg!r}"
|
||||
)
|
||||
|
||||
def test_json_mode_envelope_carries_invalid_choice_details(self) -> None:
|
||||
"""#179: unknown command envelope includes valid-choice list from argparse."""
|
||||
result = subprocess.run(
|
||||
CLI + ['typo-command', '--output-format', 'json'],
|
||||
cwd=REPO_ROOT,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
envelope = json.loads(result.stdout)
|
||||
msg = envelope['error']['message']
|
||||
assert 'invalid choice' in msg.lower(), (
|
||||
f"envelope must mention 'invalid choice'; got: {msg!r}"
|
||||
)
|
||||
# Should include at least one valid command name for discoverability
|
||||
assert 'bootstrap' in msg or 'summary' in msg, (
|
||||
f"envelope must include valid choices for discoverability; got: {msg!r}"
|
||||
)
|
||||
|
||||
def test_text_mode_stderr_preserved_on_unknown_command(self) -> None:
|
||||
"""Text mode: argparse stderr behavior unchanged (backward compat)."""
|
||||
result = subprocess.run(
|
||||
CLI + ['nonexistent-cmd'],
|
||||
cwd=REPO_ROOT,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
# Text mode still dumps argparse help to stderr
|
||||
assert 'invalid choice' in result.stderr
|
||||
assert result.returncode == 2
|
||||
@@ -173,6 +173,105 @@ class PortingWorkspaceTests(unittest.TestCase):
|
||||
self.assertIn(session_id, result.stdout)
|
||||
self.assertIn('messages', result.stdout)
|
||||
|
||||
def test_list_sessions_cli_runs(self) -> None:
|
||||
"""#160: list-sessions CLI enumerates stored sessions in text + json."""
|
||||
import json
|
||||
import tempfile
|
||||
from src.session_store import StoredSession, save_session
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
tmp_path = Path(tmp)
|
||||
for sid in ['alpha', 'bravo']:
|
||||
save_session(
|
||||
StoredSession(session_id=sid, messages=('hi',), input_tokens=1, output_tokens=2),
|
||||
tmp_path,
|
||||
)
|
||||
# text mode
|
||||
text_result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'list-sessions', '--directory', str(tmp_path)],
|
||||
check=True, capture_output=True, text=True,
|
||||
)
|
||||
self.assertIn('alpha', text_result.stdout)
|
||||
self.assertIn('bravo', text_result.stdout)
|
||||
# json mode
|
||||
json_result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'list-sessions',
|
||||
'--directory', str(tmp_path), '--output-format', 'json'],
|
||||
check=True, capture_output=True, text=True,
|
||||
)
|
||||
data = json.loads(json_result.stdout)
|
||||
# Verify common envelope fields (SCHEMAS.md contract)
|
||||
self.assertIn('timestamp', data)
|
||||
self.assertEqual(data['command'], 'list-sessions')
|
||||
self.assertEqual(data['schema_version'], '1.0')
|
||||
# Verify command-specific fields
|
||||
self.assertEqual(data['sessions'], ['alpha', 'bravo'])
|
||||
self.assertEqual(data['count'], 2)
|
||||
|
||||
def test_delete_session_cli_idempotent(self) -> None:
|
||||
"""#160: delete-session CLI is idempotent (not-found is exit 0, status=not_found)."""
|
||||
import json
|
||||
import tempfile
|
||||
from src.session_store import StoredSession, save_session
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
tmp_path = Path(tmp)
|
||||
save_session(
|
||||
StoredSession(session_id='once', messages=('hi',), input_tokens=1, output_tokens=2),
|
||||
tmp_path,
|
||||
)
|
||||
# first delete: success
|
||||
first = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'delete-session', 'once',
|
||||
'--directory', str(tmp_path), '--output-format', 'json'],
|
||||
capture_output=True, text=True,
|
||||
)
|
||||
self.assertEqual(first.returncode, 0)
|
||||
envelope_first = json.loads(first.stdout)
|
||||
# Verify common envelope fields (SCHEMAS.md contract)
|
||||
self.assertIn('timestamp', envelope_first)
|
||||
self.assertEqual(envelope_first['command'], 'delete-session')
|
||||
self.assertEqual(envelope_first['exit_code'], 0)
|
||||
self.assertEqual(envelope_first['schema_version'], '1.0')
|
||||
# Verify command-specific fields
|
||||
self.assertEqual(envelope_first['session_id'], 'once')
|
||||
self.assertEqual(envelope_first['deleted'], True)
|
||||
self.assertEqual(envelope_first['status'], 'deleted')
|
||||
# second delete: idempotent, still exit 0
|
||||
second = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'delete-session', 'once',
|
||||
'--directory', str(tmp_path), '--output-format', 'json'],
|
||||
capture_output=True, text=True,
|
||||
)
|
||||
self.assertEqual(second.returncode, 0)
|
||||
envelope_second = json.loads(second.stdout)
|
||||
self.assertEqual(envelope_second['session_id'], 'once')
|
||||
self.assertEqual(envelope_second['deleted'], False)
|
||||
self.assertEqual(envelope_second['status'], 'not_found')
|
||||
|
||||
def test_delete_session_cli_partial_failure_exit_1(self) -> None:
|
||||
"""#160: partial-failure (permission error) surfaces as exit 1 + typed JSON error."""
|
||||
import json
|
||||
import tempfile
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
tmp_path = Path(tmp)
|
||||
bad = tmp_path / 'locked.json'
|
||||
bad.mkdir()
|
||||
try:
|
||||
result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'delete-session', 'locked',
|
||||
'--directory', str(tmp_path), '--output-format', 'json'],
|
||||
capture_output=True, text=True,
|
||||
)
|
||||
self.assertEqual(result.returncode, 1)
|
||||
data = json.loads(result.stdout)
|
||||
self.assertFalse(data['deleted'])
|
||||
self.assertEqual(data['error']['kind'], 'session_delete_failed')
|
||||
self.assertTrue(data['error']['retryable'])
|
||||
finally:
|
||||
bad.rmdir()
|
||||
|
||||
def test_tool_permission_filtering_cli_runs(self) -> None:
|
||||
result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'tools', '--limit', '10', '--deny-prefix', 'mcp'],
|
||||
|
||||
156
tests/test_run_turn_loop_cancellation.py
Normal file
156
tests/test_run_turn_loop_cancellation.py
Normal file
@@ -0,0 +1,156 @@
|
||||
"""Tests for run_turn_loop timeout triggering cooperative cancel (ROADMAP #164 Stage A).
|
||||
|
||||
End-to-end integration: when the wall-clock timeout fires in run_turn_loop,
|
||||
the runtime must signal the cancel_event so any in-flight submit_message
|
||||
thread sees it at its next safe checkpoint and returns without mutating
|
||||
state.
|
||||
|
||||
This closes the gap filed in #164: #161's timeout bounded caller wait but
|
||||
did not prevent ghost turns.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import sys
|
||||
import threading
|
||||
import time
|
||||
from pathlib import Path
|
||||
from unittest.mock import patch
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
|
||||
|
||||
from src.models import UsageSummary # noqa: E402
|
||||
from src.query_engine import TurnResult # noqa: E402
|
||||
from src.runtime import PortRuntime # noqa: E402
|
||||
|
||||
|
||||
def _completed(prompt: str) -> TurnResult:
|
||||
return TurnResult(
|
||||
prompt=prompt,
|
||||
output='ok',
|
||||
matched_commands=(),
|
||||
matched_tools=(),
|
||||
permission_denials=(),
|
||||
usage=UsageSummary(),
|
||||
stop_reason='completed',
|
||||
)
|
||||
|
||||
|
||||
class TestTimeoutPropagatesCancelEvent:
|
||||
def test_runtime_passes_cancel_event_to_submit_message(self) -> None:
|
||||
"""submit_message receives a cancel_event when a deadline is in play."""
|
||||
runtime = PortRuntime()
|
||||
captured_event: list[threading.Event | None] = []
|
||||
|
||||
def _capture(prompt, commands, tools, denials, cancel_event=None):
|
||||
captured_event.append(cancel_event)
|
||||
return _completed(prompt)
|
||||
|
||||
with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
|
||||
engine = mock_factory.return_value
|
||||
engine.submit_message.side_effect = _capture
|
||||
|
||||
runtime.run_turn_loop(
|
||||
'hello', max_turns=1, timeout_seconds=5.0,
|
||||
)
|
||||
|
||||
# Runtime passed a real Event object, not None
|
||||
assert len(captured_event) == 1
|
||||
assert isinstance(captured_event[0], threading.Event)
|
||||
|
||||
def test_legacy_no_timeout_does_not_pass_cancel_event(self) -> None:
|
||||
"""Without timeout_seconds, the cancel_event is None (legacy behaviour)."""
|
||||
runtime = PortRuntime()
|
||||
captured_kwargs: list[dict] = []
|
||||
|
||||
def _capture(prompt, commands, tools, denials):
|
||||
# Legacy call signature: no cancel_event kwarg
|
||||
captured_kwargs.append({'prompt': prompt})
|
||||
return _completed(prompt)
|
||||
|
||||
with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
|
||||
engine = mock_factory.return_value
|
||||
engine.submit_message.side_effect = _capture
|
||||
|
||||
runtime.run_turn_loop('hello', max_turns=1)
|
||||
|
||||
# Legacy path didn't pass cancel_event at all
|
||||
assert len(captured_kwargs) == 1
|
||||
|
||||
def test_timeout_sets_cancel_event_before_returning(self) -> None:
|
||||
"""When timeout fires mid-call, the event is set and the still-running
|
||||
thread would see 'cancelled' if it checks before returning."""
|
||||
runtime = PortRuntime()
|
||||
observed_events_at_checkpoint: list[bool] = []
|
||||
release = threading.Event() # test-side release so the thread doesn't leak forever
|
||||
|
||||
def _slow_submit(prompt, commands, tools, denials, cancel_event=None):
|
||||
# Simulate provider work: block until either cancel or a test-side release.
|
||||
# If cancel fires, check if the event is observably set.
|
||||
start = time.monotonic()
|
||||
while time.monotonic() - start < 2.0:
|
||||
if cancel_event is not None and cancel_event.is_set():
|
||||
observed_events_at_checkpoint.append(True)
|
||||
return TurnResult(
|
||||
prompt=prompt, output='',
|
||||
matched_commands=(), matched_tools=(),
|
||||
permission_denials=(), usage=UsageSummary(),
|
||||
stop_reason='cancelled',
|
||||
)
|
||||
if release.is_set():
|
||||
break
|
||||
time.sleep(0.05)
|
||||
return _completed(prompt)
|
||||
|
||||
with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
|
||||
engine = mock_factory.return_value
|
||||
engine.submit_message.side_effect = _slow_submit
|
||||
|
||||
# Tight deadline: 0.2s, submit will be mid-loop when timeout fires
|
||||
start = time.monotonic()
|
||||
results = runtime.run_turn_loop(
|
||||
'hello', max_turns=1, timeout_seconds=0.2,
|
||||
)
|
||||
elapsed = time.monotonic() - start
|
||||
release.set() # let the background thread exit cleanly
|
||||
|
||||
# Runtime returned a timeout TurnResult to the caller
|
||||
assert results[-1].stop_reason == 'timeout'
|
||||
# And it happened within a reasonable window of the deadline
|
||||
assert elapsed < 1.5, f'runtime did not honour deadline: {elapsed:.2f}s'
|
||||
|
||||
# Give the background thread a moment to observe the cancel.
|
||||
# We don't assert on it directly (thread-level observability is
|
||||
# timing-dependent), but the contract is: the event IS set, so any
|
||||
# cooperative checkpoint will see it.
|
||||
time.sleep(0.3)
|
||||
|
||||
|
||||
class TestCancelEventSharedAcrossTurns:
|
||||
"""Event is created once per run_turn_loop invocation and shared across turns."""
|
||||
|
||||
def test_same_event_threaded_to_every_submit_message(self) -> None:
|
||||
runtime = PortRuntime()
|
||||
captured_events: list[threading.Event] = []
|
||||
|
||||
def _capture(prompt, commands, tools, denials, cancel_event=None):
|
||||
if cancel_event is not None:
|
||||
captured_events.append(cancel_event)
|
||||
return _completed(prompt)
|
||||
|
||||
with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
|
||||
engine = mock_factory.return_value
|
||||
engine.submit_message.side_effect = _capture
|
||||
|
||||
runtime.run_turn_loop(
|
||||
'hello', max_turns=3, timeout_seconds=5.0,
|
||||
continuation_prompt='continue',
|
||||
)
|
||||
|
||||
# All 3 turns received the same event object (same identity)
|
||||
assert len(captured_events) == 3
|
||||
assert all(e is captured_events[0] for e in captured_events), (
|
||||
'runtime must share one cancel_event across turns, not create '
|
||||
'a new one per turn \u2014 otherwise a late-arriving cancel on turn '
|
||||
'N-1 cannot affect turn N'
|
||||
)
|
||||
161
tests/test_run_turn_loop_continuation.py
Normal file
161
tests/test_run_turn_loop_continuation.py
Normal file
@@ -0,0 +1,161 @@
|
||||
"""Tests for run_turn_loop continuation contract (ROADMAP #163).
|
||||
|
||||
The deprecated ``f'{prompt} [turn N]'`` suffix injection is gone. Verifies:
|
||||
- No ``[turn N]`` string ever lands in a submitted prompt
|
||||
- Default (``continuation_prompt=None``) stops the loop after turn 0
|
||||
- Explicit ``continuation_prompt`` is submitted verbatim on subsequent turns
|
||||
- The first turn always gets the original prompt, not the continuation
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from unittest.mock import patch
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
|
||||
|
||||
from src.models import UsageSummary # noqa: E402
|
||||
from src.query_engine import TurnResult # noqa: E402
|
||||
from src.runtime import PortRuntime # noqa: E402
|
||||
|
||||
|
||||
def _completed_result(prompt: str) -> TurnResult:
|
||||
return TurnResult(
|
||||
prompt=prompt,
|
||||
output='ok',
|
||||
matched_commands=(),
|
||||
matched_tools=(),
|
||||
permission_denials=(),
|
||||
usage=UsageSummary(),
|
||||
stop_reason='completed',
|
||||
)
|
||||
|
||||
|
||||
class TestNoTurnSuffixInjection:
|
||||
"""Core acceptance: no prompt submitted to the engine ever contains '[turn N]'."""
|
||||
|
||||
def test_default_path_submits_original_prompt_only(self) -> None:
|
||||
runtime = PortRuntime()
|
||||
submitted: list[str] = []
|
||||
|
||||
def _capture(prompt, commands, tools, denials):
|
||||
submitted.append(prompt)
|
||||
return _completed_result(prompt)
|
||||
|
||||
with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
|
||||
engine = mock_factory.return_value
|
||||
engine.submit_message.side_effect = _capture
|
||||
|
||||
runtime.run_turn_loop('investigate this bug', max_turns=3)
|
||||
|
||||
# Without continuation_prompt, only turn 0 should run
|
||||
assert submitted == ['investigate this bug']
|
||||
# And no '[turn N]' suffix anywhere
|
||||
for p in submitted:
|
||||
assert '[turn' not in p, f'found [turn suffix in submitted prompt: {p!r}'
|
||||
|
||||
def test_with_continuation_prompt_no_turn_suffix(self) -> None:
|
||||
runtime = PortRuntime()
|
||||
submitted: list[str] = []
|
||||
|
||||
def _capture(prompt, commands, tools, denials):
|
||||
submitted.append(prompt)
|
||||
return _completed_result(prompt)
|
||||
|
||||
with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
|
||||
engine = mock_factory.return_value
|
||||
engine.submit_message.side_effect = _capture
|
||||
|
||||
runtime.run_turn_loop(
|
||||
'investigate this bug',
|
||||
max_turns=3,
|
||||
continuation_prompt='Continue.',
|
||||
)
|
||||
|
||||
# Turn 0 = original, turns 1-2 = continuation, verbatim
|
||||
assert submitted == ['investigate this bug', 'Continue.', 'Continue.']
|
||||
# No harness-injected suffix anywhere
|
||||
for p in submitted:
|
||||
assert '[turn' not in p
|
||||
assert not p.endswith(']')
|
||||
|
||||
|
||||
class TestContinuationDefaultStopsAfterTurnZero:
|
||||
def test_default_continuation_returns_one_result(self) -> None:
|
||||
runtime = PortRuntime()
|
||||
with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
|
||||
engine = mock_factory.return_value
|
||||
engine.submit_message.side_effect = lambda p, *_: _completed_result(p)
|
||||
|
||||
results = runtime.run_turn_loop('x', max_turns=5)
|
||||
assert len(results) == 1
|
||||
assert results[0].prompt == 'x'
|
||||
|
||||
def test_default_continuation_does_not_call_engine_twice(self) -> None:
|
||||
runtime = PortRuntime()
|
||||
with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
|
||||
engine = mock_factory.return_value
|
||||
engine.submit_message.side_effect = lambda p, *_: _completed_result(p)
|
||||
|
||||
runtime.run_turn_loop('x', max_turns=10)
|
||||
# Exactly one submit_message call despite max_turns=10
|
||||
assert engine.submit_message.call_count == 1
|
||||
|
||||
|
||||
class TestExplicitContinuationBehaviour:
|
||||
def test_first_turn_always_uses_original_prompt(self) -> None:
|
||||
runtime = PortRuntime()
|
||||
captured: list[str] = []
|
||||
|
||||
def _capture(prompt, *_):
|
||||
captured.append(prompt)
|
||||
return _completed_result(prompt)
|
||||
|
||||
with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
|
||||
engine = mock_factory.return_value
|
||||
engine.submit_message.side_effect = _capture
|
||||
|
||||
runtime.run_turn_loop(
|
||||
'original task', max_turns=2, continuation_prompt='keep going'
|
||||
)
|
||||
|
||||
assert captured[0] == 'original task'
|
||||
assert captured[1] == 'keep going'
|
||||
|
||||
def test_continuation_respects_max_turns(self) -> None:
|
||||
runtime = PortRuntime()
|
||||
with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
|
||||
engine = mock_factory.return_value
|
||||
engine.submit_message.side_effect = lambda p, *_: _completed_result(p)
|
||||
|
||||
runtime.run_turn_loop('x', max_turns=3, continuation_prompt='go')
|
||||
assert engine.submit_message.call_count == 3
|
||||
|
||||
|
||||
class TestCLIContinuationFlag:
|
||||
def test_cli_default_runs_one_turn(self) -> None:
|
||||
"""Without --continuation-prompt, CLI should emit exactly '## Turn 1'."""
|
||||
result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'turn-loop', 'review MCP tool',
|
||||
'--max-turns', '3', '--structured-output'],
|
||||
check=True, capture_output=True, text=True,
|
||||
)
|
||||
assert '## Turn 1' in result.stdout
|
||||
assert '## Turn 2' not in result.stdout
|
||||
assert '[turn' not in result.stdout
|
||||
|
||||
def test_cli_with_continuation_runs_multiple_turns(self) -> None:
|
||||
"""With --continuation-prompt, CLI should run up to max_turns."""
|
||||
result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'turn-loop', 'review MCP tool',
|
||||
'--max-turns', '2', '--structured-output',
|
||||
'--continuation-prompt', 'continue'],
|
||||
check=True, capture_output=True, text=True,
|
||||
)
|
||||
assert '## Turn 1' in result.stdout
|
||||
assert '## Turn 2' in result.stdout
|
||||
# The continuation text is visible (it's submitted as the turn prompt)
|
||||
# but no harness-injected [turn N] suffix
|
||||
assert '[turn' not in result.stdout
|
||||
95
tests/test_run_turn_loop_permissions.py
Normal file
95
tests/test_run_turn_loop_permissions.py
Normal file
@@ -0,0 +1,95 @@
|
||||
"""Tests for run_turn_loop permission denials parity (ROADMAP #159).
|
||||
|
||||
Verifies that multi-turn sessions have the same security posture as
|
||||
single-turn bootstrap_session: denied_tools are inferred from matches
|
||||
and threaded through every turn, not hardcoded empty.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
|
||||
|
||||
from src.runtime import PortRuntime # noqa: E402
|
||||
|
||||
|
||||
class TestPermissionDenialsInTurnLoop:
|
||||
"""#159: permission denials must be non-empty in run_turn_loop,
|
||||
matching what bootstrap_session produces for the same prompt.
|
||||
"""
|
||||
|
||||
def test_turn_loop_surfaces_permission_denials_like_bootstrap(self) -> None:
|
||||
"""Symmetry check: turn_loop and bootstrap_session infer the same denials."""
|
||||
runtime = PortRuntime()
|
||||
prompt = 'run bash ls'
|
||||
|
||||
# Single-turn via bootstrap
|
||||
bootstrap_result = runtime.bootstrap_session(prompt)
|
||||
bootstrap_denials = bootstrap_result.turn_result.permission_denials
|
||||
|
||||
# Multi-turn via run_turn_loop (single turn, no continuation)
|
||||
loop_results = runtime.run_turn_loop(prompt, max_turns=1)
|
||||
loop_denials = loop_results[0].permission_denials
|
||||
|
||||
# Both should infer denials for bash-family tools
|
||||
assert len(bootstrap_denials) > 0, (
|
||||
'bootstrap_session should deny bash-family tools'
|
||||
)
|
||||
assert len(loop_denials) > 0, (
|
||||
f'#159 regression: run_turn_loop returned empty denials; '
|
||||
f'expected {len(bootstrap_denials)} like bootstrap_session'
|
||||
)
|
||||
|
||||
# The denial kinds should match (both deny the same tools)
|
||||
bootstrap_denied_names = {d.tool_name for d in bootstrap_denials}
|
||||
loop_denied_names = {d.tool_name for d in loop_denials}
|
||||
assert bootstrap_denied_names == loop_denied_names, (
|
||||
f'asymmetric denials: bootstrap denied {bootstrap_denied_names}, '
|
||||
f'loop denied {loop_denied_names}'
|
||||
)
|
||||
|
||||
def test_turn_loop_with_continuation_preserves_denials(self) -> None:
|
||||
"""Denials are inferred once at loop start, then passed to every turn."""
|
||||
runtime = PortRuntime()
|
||||
from unittest.mock import patch
|
||||
|
||||
with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
|
||||
from src.models import UsageSummary
|
||||
from src.query_engine import TurnResult
|
||||
|
||||
engine = mock_factory.return_value
|
||||
submitted_denials: list[tuple] = []
|
||||
|
||||
def _capture(prompt, commands, tools, denials):
|
||||
submitted_denials.append(denials)
|
||||
return TurnResult(
|
||||
prompt=prompt,
|
||||
output='ok',
|
||||
matched_commands=(),
|
||||
matched_tools=(),
|
||||
permission_denials=denials, # echo back the denials
|
||||
usage=UsageSummary(),
|
||||
stop_reason='completed',
|
||||
)
|
||||
|
||||
engine.submit_message.side_effect = _capture
|
||||
|
||||
loop_results = runtime.run_turn_loop(
|
||||
'run bash rm', max_turns=2, continuation_prompt='continue'
|
||||
)
|
||||
|
||||
# Both turn 0 and turn 1 should have received the same denials
|
||||
assert len(submitted_denials) == 2
|
||||
assert submitted_denials[0] == submitted_denials[1], (
|
||||
'denials should be consistent across all turns'
|
||||
)
|
||||
# And they should be non-empty (bash is destructive)
|
||||
assert len(submitted_denials[0]) > 0, (
|
||||
'turn-loop denials were empty — #159 regression'
|
||||
)
|
||||
|
||||
# Turn results should reflect the denials that were passed
|
||||
for result in loop_results:
|
||||
assert len(result.permission_denials) > 0
|
||||
179
tests/test_run_turn_loop_timeout.py
Normal file
179
tests/test_run_turn_loop_timeout.py
Normal file
@@ -0,0 +1,179 @@
|
||||
"""Tests for run_turn_loop wall-clock timeout (ROADMAP #161).
|
||||
|
||||
Covers:
|
||||
- timeout_seconds=None preserves legacy unbounded behaviour
|
||||
- timeout_seconds=X aborts a hung turn and emits stop_reason='timeout'
|
||||
- Timeout budget is total wall-clock across all turns, not per-turn
|
||||
- Already-exhausted budget short-circuits before the first turn runs
|
||||
- Legacy path still runs without a ThreadPoolExecutor in the way
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import sys
|
||||
import time
|
||||
from pathlib import Path
|
||||
from unittest.mock import patch
|
||||
|
||||
import pytest
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
|
||||
|
||||
from src.models import UsageSummary # noqa: E402
|
||||
from src.query_engine import TurnResult # noqa: E402
|
||||
from src.runtime import PortRuntime # noqa: E402
|
||||
|
||||
|
||||
def _completed_result(prompt: str) -> TurnResult:
|
||||
return TurnResult(
|
||||
prompt=prompt,
|
||||
output='ok',
|
||||
matched_commands=(),
|
||||
matched_tools=(),
|
||||
permission_denials=(),
|
||||
usage=UsageSummary(),
|
||||
stop_reason='completed',
|
||||
)
|
||||
|
||||
|
||||
class TestLegacyUnboundedBehaviour:
|
||||
def test_no_timeout_preserves_existing_behaviour(self) -> None:
|
||||
"""timeout_seconds=None must not change legacy path at all."""
|
||||
results = PortRuntime().run_turn_loop('review MCP tool', max_turns=2)
|
||||
assert len(results) >= 1
|
||||
for r in results:
|
||||
assert r.stop_reason in {'completed', 'max_turns_reached', 'max_budget_reached'}
|
||||
assert r.stop_reason != 'timeout'
|
||||
|
||||
|
||||
class TestTimeoutAbortsHungTurn:
|
||||
def test_hung_submit_message_times_out(self) -> None:
|
||||
"""A stalled submit_message must be aborted and emit stop_reason='timeout'."""
|
||||
runtime = PortRuntime()
|
||||
|
||||
# #164 Stage A: runtime now passes cancel_event as a 5th positional
|
||||
# arg on the timeout path, so mocks must accept it (even if they ignore it).
|
||||
def _hang(prompt, commands, tools, denials, cancel_event=None):
|
||||
time.sleep(5.0) # would block the loop
|
||||
return _completed_result(prompt)
|
||||
|
||||
with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
|
||||
engine = mock_factory.return_value
|
||||
engine.config = None # attribute-assigned in run_turn_loop
|
||||
engine.submit_message.side_effect = _hang
|
||||
|
||||
start = time.monotonic()
|
||||
results = runtime.run_turn_loop(
|
||||
'review MCP tool', max_turns=3, timeout_seconds=0.3
|
||||
)
|
||||
elapsed = time.monotonic() - start
|
||||
|
||||
# Must exit well under the 5s hang
|
||||
assert elapsed < 1.5, f'run_turn_loop did not honor timeout: {elapsed:.2f}s'
|
||||
assert len(results) == 1
|
||||
assert results[-1].stop_reason == 'timeout'
|
||||
|
||||
|
||||
class TestTimeoutBudgetIsTotal:
|
||||
def test_budget_is_cumulative_across_turns(self) -> None:
|
||||
"""timeout_seconds is total wall-clock across all turns, not per-turn.
|
||||
|
||||
#163 interaction: multi-turn behaviour now requires an explicit
|
||||
``continuation_prompt``; otherwise the loop stops after turn 0 and
|
||||
the cumulative-budget contract is trivially satisfied. We supply one
|
||||
here so the test actually exercises the cross-turn deadline.
|
||||
"""
|
||||
runtime = PortRuntime()
|
||||
call_count = {'n': 0}
|
||||
|
||||
def _slow(prompt, commands, tools, denials, cancel_event=None):
|
||||
call_count['n'] += 1
|
||||
time.sleep(0.4) # each turn burns 0.4s
|
||||
return _completed_result(prompt)
|
||||
|
||||
with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
|
||||
engine = mock_factory.return_value
|
||||
engine.submit_message.side_effect = _slow
|
||||
|
||||
start = time.monotonic()
|
||||
# 0.6s budget, 0.4s per turn. First turn completes (~0.4s),
|
||||
# second turn times out before finishing.
|
||||
results = runtime.run_turn_loop(
|
||||
'review MCP tool',
|
||||
max_turns=5,
|
||||
timeout_seconds=0.6,
|
||||
continuation_prompt='continue',
|
||||
)
|
||||
elapsed = time.monotonic() - start
|
||||
|
||||
# Should exit at around 0.6s, not 2.0s (5 turns * 0.4s)
|
||||
assert elapsed < 1.5, f'cumulative budget not honored: {elapsed:.2f}s'
|
||||
# Last result should be the timeout
|
||||
assert results[-1].stop_reason == 'timeout'
|
||||
|
||||
|
||||
class TestExhaustedBudget:
|
||||
def test_zero_timeout_short_circuits_first_turn(self) -> None:
|
||||
"""timeout_seconds=0 emits timeout before the first submit_message call."""
|
||||
runtime = PortRuntime()
|
||||
|
||||
with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
|
||||
engine = mock_factory.return_value
|
||||
# submit_message should never be called when budget is already 0
|
||||
engine.submit_message.side_effect = AssertionError(
|
||||
'submit_message should not run when budget is exhausted'
|
||||
)
|
||||
|
||||
results = runtime.run_turn_loop(
|
||||
'review MCP tool', max_turns=3, timeout_seconds=0.0
|
||||
)
|
||||
|
||||
assert len(results) == 1
|
||||
assert results[0].stop_reason == 'timeout'
|
||||
|
||||
|
||||
class TestTimeoutResultShape:
|
||||
def test_timeout_result_has_correct_prompt_and_matches(self) -> None:
|
||||
"""Synthetic TurnResult on timeout must carry the turn's prompt + routed matches."""
|
||||
runtime = PortRuntime()
|
||||
|
||||
def _hang(prompt, commands, tools, denials, cancel_event=None):
|
||||
time.sleep(5.0)
|
||||
return _completed_result(prompt)
|
||||
|
||||
with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
|
||||
engine = mock_factory.return_value
|
||||
engine.submit_message.side_effect = _hang
|
||||
|
||||
results = runtime.run_turn_loop(
|
||||
'review MCP tool', max_turns=2, timeout_seconds=0.2
|
||||
)
|
||||
|
||||
timeout_result = results[-1]
|
||||
assert timeout_result.stop_reason == 'timeout'
|
||||
assert timeout_result.prompt == 'review MCP tool'
|
||||
# matched_commands / matched_tools should still be populated from routing,
|
||||
# so downstream transcripts don't lose the routing context.
|
||||
# These may be empty tuples depending on routing; they must be tuples.
|
||||
assert isinstance(timeout_result.matched_commands, tuple)
|
||||
assert isinstance(timeout_result.matched_tools, tuple)
|
||||
assert isinstance(timeout_result.usage, UsageSummary)
|
||||
|
||||
|
||||
class TestNegativeTimeoutTreatedAsExhausted:
|
||||
def test_negative_timeout_short_circuits(self) -> None:
|
||||
"""A negative budget should behave identically to exhausted."""
|
||||
runtime = PortRuntime()
|
||||
|
||||
with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
|
||||
engine = mock_factory.return_value
|
||||
engine.submit_message.side_effect = AssertionError(
|
||||
'submit_message should not run when budget is negative'
|
||||
)
|
||||
|
||||
results = runtime.run_turn_loop(
|
||||
'review MCP tool', max_turns=3, timeout_seconds=-1.0
|
||||
)
|
||||
|
||||
assert len(results) == 1
|
||||
assert results[0].stop_reason == 'timeout'
|
||||
173
tests/test_session_store.py
Normal file
173
tests/test_session_store.py
Normal file
@@ -0,0 +1,173 @@
|
||||
"""Tests for session_store CRUD surface (ROADMAP #160).
|
||||
|
||||
Covers:
|
||||
- list_sessions enumeration
|
||||
- session_exists boolean check
|
||||
- delete_session idempotency + race-safety + partial-failure contract
|
||||
- SessionNotFoundError typing (KeyError subclass)
|
||||
- SessionDeleteError typing (OSError subclass)
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent / 'src'))
|
||||
|
||||
from session_store import ( # noqa: E402
|
||||
StoredSession,
|
||||
SessionDeleteError,
|
||||
SessionNotFoundError,
|
||||
delete_session,
|
||||
list_sessions,
|
||||
load_session,
|
||||
save_session,
|
||||
session_exists,
|
||||
)
|
||||
|
||||
|
||||
def _make_session(session_id: str) -> StoredSession:
|
||||
return StoredSession(
|
||||
session_id=session_id,
|
||||
messages=('hello',),
|
||||
input_tokens=1,
|
||||
output_tokens=2,
|
||||
)
|
||||
|
||||
|
||||
class TestListSessions:
|
||||
def test_empty_directory_returns_empty_list(self, tmp_path: Path) -> None:
|
||||
assert list_sessions(tmp_path) == []
|
||||
|
||||
def test_nonexistent_directory_returns_empty_list(self, tmp_path: Path) -> None:
|
||||
missing = tmp_path / 'never-created'
|
||||
assert list_sessions(missing) == []
|
||||
|
||||
def test_lists_saved_sessions_sorted(self, tmp_path: Path) -> None:
|
||||
save_session(_make_session('charlie'), tmp_path)
|
||||
save_session(_make_session('alpha'), tmp_path)
|
||||
save_session(_make_session('bravo'), tmp_path)
|
||||
assert list_sessions(tmp_path) == ['alpha', 'bravo', 'charlie']
|
||||
|
||||
def test_ignores_non_json_files(self, tmp_path: Path) -> None:
|
||||
save_session(_make_session('real'), tmp_path)
|
||||
(tmp_path / 'notes.txt').write_text('ignore me')
|
||||
(tmp_path / 'data.yaml').write_text('ignore me too')
|
||||
assert list_sessions(tmp_path) == ['real']
|
||||
|
||||
|
||||
class TestSessionExists:
|
||||
def test_returns_true_for_saved_session(self, tmp_path: Path) -> None:
|
||||
save_session(_make_session('present'), tmp_path)
|
||||
assert session_exists('present', tmp_path) is True
|
||||
|
||||
def test_returns_false_for_missing_session(self, tmp_path: Path) -> None:
|
||||
assert session_exists('absent', tmp_path) is False
|
||||
|
||||
def test_returns_false_for_nonexistent_directory(self, tmp_path: Path) -> None:
|
||||
missing = tmp_path / 'never-created'
|
||||
assert session_exists('anything', missing) is False
|
||||
|
||||
|
||||
class TestLoadSession:
|
||||
def test_raises_typed_error_on_missing(self, tmp_path: Path) -> None:
|
||||
with pytest.raises(SessionNotFoundError) as exc_info:
|
||||
load_session('nonexistent', tmp_path)
|
||||
assert 'nonexistent' in str(exc_info.value)
|
||||
|
||||
def test_not_found_error_is_keyerror_subclass(self, tmp_path: Path) -> None:
|
||||
"""Orchestrators catching KeyError should still work."""
|
||||
with pytest.raises(KeyError):
|
||||
load_session('nonexistent', tmp_path)
|
||||
|
||||
def test_not_found_error_is_not_filenotfounderror(self, tmp_path: Path) -> None:
|
||||
"""Callers can distinguish 'not found' from IO errors."""
|
||||
with pytest.raises(SessionNotFoundError):
|
||||
load_session('nonexistent', tmp_path)
|
||||
# Specifically, it should NOT match bare FileNotFoundError alone
|
||||
# (SessionNotFoundError inherits from KeyError, not FileNotFoundError)
|
||||
assert not issubclass(SessionNotFoundError, FileNotFoundError)
|
||||
|
||||
|
||||
class TestDeleteSessionIdempotency:
|
||||
"""Contract: delete_session(x) followed by delete_session(x) must be safe."""
|
||||
|
||||
def test_first_delete_returns_true(self, tmp_path: Path) -> None:
|
||||
save_session(_make_session('to-delete'), tmp_path)
|
||||
assert delete_session('to-delete', tmp_path) is True
|
||||
|
||||
def test_second_delete_returns_false_no_raise(self, tmp_path: Path) -> None:
|
||||
"""Idempotency: deleting an already-deleted session is a no-op."""
|
||||
save_session(_make_session('once'), tmp_path)
|
||||
delete_session('once', tmp_path)
|
||||
# Second call must not raise
|
||||
assert delete_session('once', tmp_path) is False
|
||||
|
||||
def test_delete_nonexistent_returns_false_no_raise(self, tmp_path: Path) -> None:
|
||||
"""Never-existed session is treated identically to already-deleted."""
|
||||
assert delete_session('never-existed', tmp_path) is False
|
||||
|
||||
def test_delete_removes_only_target(self, tmp_path: Path) -> None:
|
||||
save_session(_make_session('keep'), tmp_path)
|
||||
save_session(_make_session('remove'), tmp_path)
|
||||
delete_session('remove', tmp_path)
|
||||
assert list_sessions(tmp_path) == ['keep']
|
||||
|
||||
|
||||
class TestDeleteSessionPartialFailure:
|
||||
"""Contract: file exists but cannot be removed -> SessionDeleteError."""
|
||||
|
||||
def test_partial_failure_raises_session_delete_error(self, tmp_path: Path) -> None:
|
||||
"""If a directory exists where a session file should be, unlink fails."""
|
||||
bad_path = tmp_path / 'locked.json'
|
||||
bad_path.mkdir()
|
||||
try:
|
||||
with pytest.raises(SessionDeleteError) as exc_info:
|
||||
delete_session('locked', tmp_path)
|
||||
# Underlying cause should be wrapped
|
||||
assert exc_info.value.__cause__ is not None
|
||||
assert isinstance(exc_info.value.__cause__, OSError)
|
||||
finally:
|
||||
bad_path.rmdir()
|
||||
|
||||
def test_delete_error_is_oserror_subclass(self, tmp_path: Path) -> None:
|
||||
"""Callers catching OSError should still work for retries."""
|
||||
bad_path = tmp_path / 'locked.json'
|
||||
bad_path.mkdir()
|
||||
try:
|
||||
with pytest.raises(OSError):
|
||||
delete_session('locked', tmp_path)
|
||||
finally:
|
||||
bad_path.rmdir()
|
||||
|
||||
|
||||
class TestRaceSafety:
|
||||
"""Contract: delete_session must be race-safe between exists-check and unlink."""
|
||||
|
||||
def test_concurrent_deletion_returns_false_not_raises(
|
||||
self, tmp_path: Path, monkeypatch
|
||||
) -> None:
|
||||
"""If another process deletes between exists-check and unlink, return False."""
|
||||
save_session(_make_session('racy'), tmp_path)
|
||||
# Simulate: file disappears right before unlink (concurrent deletion)
|
||||
path = tmp_path / 'racy.json'
|
||||
path.unlink()
|
||||
# Now delete_session should return False, not raise
|
||||
assert delete_session('racy', tmp_path) is False
|
||||
|
||||
|
||||
class TestRoundtrip:
|
||||
def test_save_list_load_delete_cycle(self, tmp_path: Path) -> None:
|
||||
session = _make_session('lifecycle')
|
||||
save_session(session, tmp_path)
|
||||
assert 'lifecycle' in list_sessions(tmp_path)
|
||||
assert session_exists('lifecycle', tmp_path)
|
||||
loaded = load_session('lifecycle', tmp_path)
|
||||
assert loaded.session_id == 'lifecycle'
|
||||
assert loaded.messages == ('hello',)
|
||||
assert delete_session('lifecycle', tmp_path) is True
|
||||
assert not session_exists('lifecycle', tmp_path)
|
||||
assert list_sessions(tmp_path) == []
|
||||
203
tests/test_show_command_tool_output_format.py
Normal file
203
tests/test_show_command_tool_output_format.py
Normal file
@@ -0,0 +1,203 @@
|
||||
"""Tests for --output-format flag on show-command and show-tool (ROADMAP #167).
|
||||
|
||||
Verifies parity with session-lifecycle CLI family (#160/#165/#166):
|
||||
- show-command and show-tool now accept --output-format {text,json}
|
||||
- Found case returns success with JSON envelope: {name, found: true, source_hint, responsibility}
|
||||
- Not-found case returns typed error envelope: {name, found: false, error: {kind, message, retryable}}
|
||||
- Legacy text output (default) unchanged for backward compat
|
||||
- Exit code 0 on success, 1 on not-found (matching load-session contract)
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
|
||||
|
||||
|
||||
class TestShowCommandOutputFormat:
|
||||
"""show-command --output-format {text,json} parity with session-lifecycle family."""
|
||||
|
||||
def test_show_command_found_json(self) -> None:
|
||||
"""show-command with found entry returns JSON envelope."""
|
||||
result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'show-command', 'add-dir', '--output-format', 'json'],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
assert result.returncode == 0, f'Expected exit 0, got {result.returncode}: {result.stderr}'
|
||||
|
||||
envelope = json.loads(result.stdout)
|
||||
assert envelope['found'] is True
|
||||
assert envelope['name'] == 'add-dir'
|
||||
assert 'source_hint' in envelope
|
||||
assert 'responsibility' in envelope
|
||||
# No error field when found
|
||||
assert 'error' not in envelope
|
||||
|
||||
def test_show_command_not_found_json(self) -> None:
|
||||
"""show-command with missing entry returns typed error envelope."""
|
||||
result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'show-command', 'nonexistent-cmd', '--output-format', 'json'],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
assert result.returncode == 1, f'Expected exit 1 on not-found, got {result.returncode}'
|
||||
|
||||
envelope = json.loads(result.stdout)
|
||||
assert envelope['found'] is False
|
||||
assert envelope['name'] == 'nonexistent-cmd'
|
||||
assert envelope['error']['kind'] == 'command_not_found'
|
||||
assert envelope['error']['retryable'] is False
|
||||
# No source_hint/responsibility when not found
|
||||
assert 'source_hint' not in envelope or envelope.get('source_hint') is None
|
||||
assert 'responsibility' not in envelope or envelope.get('responsibility') is None
|
||||
|
||||
def test_show_command_text_mode_backward_compat(self) -> None:
|
||||
"""show-command text mode (default) is unchanged from pre-#167."""
|
||||
result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'show-command', 'add-dir'],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
assert result.returncode == 0
|
||||
|
||||
# Text output is newline-separated (name, source_hint, responsibility)
|
||||
lines = result.stdout.strip().split('\n')
|
||||
assert len(lines) == 3
|
||||
assert lines[0] == 'add-dir'
|
||||
assert 'commands/add-dir/add-dir.tsx' in lines[1]
|
||||
|
||||
def test_show_command_text_mode_not_found(self) -> None:
|
||||
"""show-command text mode on not-found returns prose error."""
|
||||
result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'show-command', 'missing'],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
assert result.returncode == 1
|
||||
assert 'not found' in result.stdout.lower()
|
||||
assert 'missing' in result.stdout
|
||||
|
||||
def test_show_command_default_is_text(self) -> None:
|
||||
"""Omitting --output-format defaults to text."""
|
||||
result_implicit = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'show-command', 'add-dir'],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
result_explicit = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'show-command', 'add-dir', '--output-format', 'text'],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
assert result_implicit.stdout == result_explicit.stdout
|
||||
|
||||
|
||||
class TestShowToolOutputFormat:
|
||||
"""show-tool --output-format {text,json} parity with session-lifecycle family."""
|
||||
|
||||
def test_show_tool_found_json(self) -> None:
|
||||
"""show-tool with found entry returns JSON envelope."""
|
||||
result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'show-tool', 'BashTool', '--output-format', 'json'],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
assert result.returncode == 0, f'Expected exit 0, got {result.returncode}: {result.stderr}'
|
||||
|
||||
envelope = json.loads(result.stdout)
|
||||
assert envelope['found'] is True
|
||||
assert envelope['name'] == 'BashTool'
|
||||
assert 'source_hint' in envelope
|
||||
assert 'responsibility' in envelope
|
||||
assert 'error' not in envelope
|
||||
|
||||
def test_show_tool_not_found_json(self) -> None:
|
||||
"""show-tool with missing entry returns typed error envelope."""
|
||||
result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'show-tool', 'NotARealTool', '--output-format', 'json'],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
assert result.returncode == 1, f'Expected exit 1 on not-found, got {result.returncode}'
|
||||
|
||||
envelope = json.loads(result.stdout)
|
||||
assert envelope['found'] is False
|
||||
assert envelope['name'] == 'NotARealTool'
|
||||
assert envelope['error']['kind'] == 'tool_not_found'
|
||||
assert envelope['error']['retryable'] is False
|
||||
|
||||
def test_show_tool_text_mode_backward_compat(self) -> None:
|
||||
"""show-tool text mode (default) is unchanged from pre-#167."""
|
||||
result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'show-tool', 'BashTool'],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
assert result.returncode == 0
|
||||
|
||||
lines = result.stdout.strip().split('\n')
|
||||
assert len(lines) == 3
|
||||
assert lines[0] == 'BashTool'
|
||||
assert 'tools/BashTool/BashTool.tsx' in lines[1]
|
||||
|
||||
|
||||
class TestShowCommandToolFormatParity:
|
||||
"""Verify symmetry between show-command and show-tool formats."""
|
||||
|
||||
def test_both_accept_output_format_flag(self) -> None:
|
||||
"""Both commands accept the same --output-format choices."""
|
||||
# Just ensure both fail with invalid choice (they accept text/json)
|
||||
result_cmd = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'show-command', 'add-dir', '--output-format', 'invalid'],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
result_tool = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'show-tool', 'BashTool', '--output-format', 'invalid'],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
# Both should fail with argument parser error
|
||||
assert result_cmd.returncode != 0
|
||||
assert result_tool.returncode != 0
|
||||
assert 'invalid choice' in result_cmd.stderr
|
||||
assert 'invalid choice' in result_tool.stderr
|
||||
|
||||
def test_json_envelope_shape_consistency(self) -> None:
|
||||
"""Both commands return consistent JSON envelope shape."""
|
||||
cmd_result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'show-command', 'add-dir', '--output-format', 'json'],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
tool_result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'show-tool', 'BashTool', '--output-format', 'json'],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
|
||||
cmd_envelope = json.loads(cmd_result.stdout)
|
||||
tool_envelope = json.loads(tool_result.stdout)
|
||||
|
||||
# Same top-level keys for found=true case
|
||||
assert set(cmd_envelope.keys()) == set(tool_envelope.keys())
|
||||
assert cmd_envelope['found'] is True
|
||||
assert tool_envelope['found'] is True
|
||||
167
tests/test_submit_message_budget.py
Normal file
167
tests/test_submit_message_budget.py
Normal file
@@ -0,0 +1,167 @@
|
||||
"""Tests for submit_message budget-overflow atomicity (ROADMAP #162).
|
||||
|
||||
Covers:
|
||||
- Budget overflow returns stop_reason='max_budget_reached' without mutating session
|
||||
- mutable_messages, transcript_store, permission_denials, total_usage all unchanged
|
||||
- Session persisted after overflow does not contain the overflow turn
|
||||
- Engine remains usable after overflow: subsequent in-budget call succeeds
|
||||
- Normal (non-overflow) path still commits state as before
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
|
||||
|
||||
from src.models import PermissionDenial, UsageSummary # noqa: E402
|
||||
from src.port_manifest import build_port_manifest # noqa: E402
|
||||
from src.query_engine import QueryEngineConfig, QueryEnginePort # noqa: E402
|
||||
from src.session_store import StoredSession, load_session, save_session # noqa: E402
|
||||
|
||||
|
||||
def _make_engine(max_budget_tokens: int = 10) -> QueryEnginePort:
|
||||
engine = QueryEnginePort(manifest=build_port_manifest())
|
||||
engine.config = QueryEngineConfig(max_budget_tokens=max_budget_tokens)
|
||||
return engine
|
||||
|
||||
|
||||
class TestBudgetOverflowDoesNotMutate:
|
||||
"""The core #162 contract: overflow must leave session state untouched."""
|
||||
|
||||
def test_mutable_messages_unchanged_on_overflow(self) -> None:
|
||||
engine = _make_engine(max_budget_tokens=10)
|
||||
pre_count = len(engine.mutable_messages)
|
||||
overflow_prompt = ' '.join(['word'] * 50)
|
||||
result = engine.submit_message(overflow_prompt)
|
||||
assert result.stop_reason == 'max_budget_reached'
|
||||
assert len(engine.mutable_messages) == pre_count
|
||||
|
||||
def test_transcript_unchanged_on_overflow(self) -> None:
|
||||
engine = _make_engine(max_budget_tokens=10)
|
||||
pre_count = len(engine.transcript_store.entries)
|
||||
overflow_prompt = ' '.join(['word'] * 50)
|
||||
result = engine.submit_message(overflow_prompt)
|
||||
assert result.stop_reason == 'max_budget_reached'
|
||||
assert len(engine.transcript_store.entries) == pre_count
|
||||
|
||||
def test_permission_denials_unchanged_on_overflow(self) -> None:
|
||||
engine = _make_engine(max_budget_tokens=10)
|
||||
pre_count = len(engine.permission_denials)
|
||||
denials = (PermissionDenial(tool_name='bash', reason='gated in test'),)
|
||||
overflow_prompt = ' '.join(['word'] * 50)
|
||||
result = engine.submit_message(overflow_prompt, denied_tools=denials)
|
||||
assert result.stop_reason == 'max_budget_reached'
|
||||
assert len(engine.permission_denials) == pre_count
|
||||
|
||||
def test_total_usage_unchanged_on_overflow(self) -> None:
|
||||
engine = _make_engine(max_budget_tokens=10)
|
||||
pre_usage = engine.total_usage
|
||||
overflow_prompt = ' '.join(['word'] * 50)
|
||||
result = engine.submit_message(overflow_prompt)
|
||||
assert result.stop_reason == 'max_budget_reached'
|
||||
assert engine.total_usage == pre_usage
|
||||
|
||||
def test_turn_result_reports_pre_mutation_usage(self) -> None:
|
||||
"""The TurnResult.usage must reflect session state as-if overflow never happened."""
|
||||
engine = _make_engine(max_budget_tokens=10)
|
||||
pre_usage = engine.total_usage
|
||||
overflow_prompt = ' '.join(['word'] * 50)
|
||||
result = engine.submit_message(overflow_prompt)
|
||||
assert result.stop_reason == 'max_budget_reached'
|
||||
assert result.usage == pre_usage
|
||||
|
||||
|
||||
class TestOverflowPersistence:
|
||||
"""Session persisted after overflow must not contain the overflow turn."""
|
||||
|
||||
def test_persisted_session_empty_when_first_turn_overflows(
|
||||
self, tmp_path: Path, monkeypatch
|
||||
) -> None:
|
||||
"""When the very first call overflows, persisted session has zero messages."""
|
||||
monkeypatch.chdir(tmp_path)
|
||||
engine = _make_engine(max_budget_tokens=10)
|
||||
overflow_prompt = ' '.join(['word'] * 50)
|
||||
result = engine.submit_message(overflow_prompt)
|
||||
assert result.stop_reason == 'max_budget_reached'
|
||||
|
||||
path_str = engine.persist_session()
|
||||
path = Path(path_str)
|
||||
assert path.exists()
|
||||
loaded = load_session(path.stem, path.parent)
|
||||
assert loaded.messages == (), (
|
||||
f'overflow turn poisoned session: {loaded.messages!r}'
|
||||
)
|
||||
|
||||
def test_persisted_session_retains_only_successful_turns(
|
||||
self, tmp_path: Path, monkeypatch
|
||||
) -> None:
|
||||
"""A successful turn followed by an overflow persists only the successful turn."""
|
||||
monkeypatch.chdir(tmp_path)
|
||||
# Budget large enough for one short turn but not a second big one.
|
||||
# Token counting is whitespace-split (see UsageSummary.add_turn),
|
||||
# so overflow prompts must contain many whitespace-separated words.
|
||||
engine = QueryEnginePort(manifest=build_port_manifest())
|
||||
engine.config = QueryEngineConfig(max_budget_tokens=50)
|
||||
|
||||
ok = engine.submit_message('short')
|
||||
assert ok.stop_reason == 'completed'
|
||||
assert 'short' in engine.mutable_messages
|
||||
|
||||
# 500 whitespace-separated tokens — definitely over a 50-token budget
|
||||
overflow_prompt = ' '.join(['word'] * 500)
|
||||
overflow = engine.submit_message(overflow_prompt)
|
||||
assert overflow.stop_reason == 'max_budget_reached'
|
||||
|
||||
path = Path(engine.persist_session())
|
||||
loaded = load_session(path.stem, path.parent)
|
||||
assert loaded.messages == ('short',), (
|
||||
f'expected only the successful turn, got {loaded.messages!r}'
|
||||
)
|
||||
|
||||
|
||||
class TestEngineUsableAfterOverflow:
|
||||
"""After overflow, engine must still be usable — overflow is rejection, not corruption."""
|
||||
|
||||
def test_subsequent_in_budget_call_succeeds(self) -> None:
|
||||
"""After an overflow rejection, raising the budget and retrying works."""
|
||||
engine = _make_engine(max_budget_tokens=10)
|
||||
overflow_prompt = ' '.join(['word'] * 100)
|
||||
overflow = engine.submit_message(overflow_prompt)
|
||||
assert overflow.stop_reason == 'max_budget_reached'
|
||||
|
||||
# Raise the budget and retry — the engine should be in a clean state
|
||||
engine.config = QueryEngineConfig(max_budget_tokens=10_000)
|
||||
ok = engine.submit_message('short retry')
|
||||
assert ok.stop_reason == 'completed'
|
||||
assert 'short retry' in engine.mutable_messages
|
||||
# The overflow prompt should never have been recorded
|
||||
assert overflow_prompt not in engine.mutable_messages
|
||||
|
||||
def test_multiple_overflow_calls_remain_idempotent(self) -> None:
|
||||
"""Repeated overflow calls must not accumulate hidden state."""
|
||||
engine = _make_engine(max_budget_tokens=10)
|
||||
overflow_prompt = ' '.join(['word'] * 50)
|
||||
for _ in range(5):
|
||||
result = engine.submit_message(overflow_prompt)
|
||||
assert result.stop_reason == 'max_budget_reached'
|
||||
assert len(engine.mutable_messages) == 0
|
||||
assert len(engine.transcript_store.entries) == 0
|
||||
assert engine.total_usage == UsageSummary()
|
||||
|
||||
|
||||
class TestNormalPathStillCommits:
|
||||
"""Regression guard: non-overflow path must still mutate state as before."""
|
||||
|
||||
def test_in_budget_turn_commits_all_state(self) -> None:
|
||||
engine = QueryEnginePort(manifest=build_port_manifest())
|
||||
engine.config = QueryEngineConfig(max_budget_tokens=10_000)
|
||||
result = engine.submit_message('review MCP tool')
|
||||
assert result.stop_reason == 'completed'
|
||||
assert len(engine.mutable_messages) == 1
|
||||
assert len(engine.transcript_store.entries) == 1
|
||||
assert engine.total_usage.input_tokens > 0
|
||||
assert engine.total_usage.output_tokens > 0
|
||||
220
tests/test_submit_message_cancellation.py
Normal file
220
tests/test_submit_message_cancellation.py
Normal file
@@ -0,0 +1,220 @@
|
||||
"""Tests for cooperative cancellation in submit_message (ROADMAP #164 Stage A).
|
||||
|
||||
Verifies that cancel_event enables safe early termination:
|
||||
- Event set before call => immediate return with stop_reason='cancelled'
|
||||
- Event set between budget check and commit => still 'cancelled', no mutation
|
||||
- Event set after commit => not observable (honest cooperative limit)
|
||||
- Legacy callers (cancel_event=None) see zero behaviour change
|
||||
- State is untouched on cancellation: mutable_messages, transcript_store,
|
||||
permission_denials, total_usage all preserved
|
||||
|
||||
This closes the #161 follow-up gap filed as #164: wedged provider threads
|
||||
can no longer silently commit ghost turns after the caller observed a
|
||||
timeout.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import sys
|
||||
import threading
|
||||
from pathlib import Path
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
|
||||
|
||||
from src.models import PermissionDenial # noqa: E402
|
||||
from src.port_manifest import build_port_manifest # noqa: E402
|
||||
from src.query_engine import QueryEngineConfig, QueryEnginePort, TurnResult # noqa: E402
|
||||
|
||||
|
||||
def _fresh_engine(**config_overrides) -> QueryEnginePort:
|
||||
config = QueryEngineConfig(**config_overrides) if config_overrides else QueryEngineConfig()
|
||||
return QueryEnginePort(manifest=build_port_manifest(), config=config)
|
||||
|
||||
|
||||
class TestCancellationBeforeCall:
|
||||
"""Event set before submit_message is invoked => immediate 'cancelled'."""
|
||||
|
||||
def test_pre_set_event_returns_cancelled_immediately(self) -> None:
|
||||
engine = _fresh_engine()
|
||||
event = threading.Event()
|
||||
event.set()
|
||||
|
||||
result = engine.submit_message('hello', cancel_event=event)
|
||||
|
||||
assert result.stop_reason == 'cancelled'
|
||||
assert result.prompt == 'hello'
|
||||
# Output is empty on pre-budget cancel (no synthesis)
|
||||
assert result.output == ''
|
||||
|
||||
def test_pre_set_event_preserves_mutable_messages(self) -> None:
|
||||
engine = _fresh_engine()
|
||||
event = threading.Event()
|
||||
event.set()
|
||||
|
||||
engine.submit_message('ghost turn', cancel_event=event)
|
||||
|
||||
assert engine.mutable_messages == [], (
|
||||
'cancelled turn must not appear in mutable_messages'
|
||||
)
|
||||
|
||||
def test_pre_set_event_preserves_transcript_store(self) -> None:
|
||||
engine = _fresh_engine()
|
||||
event = threading.Event()
|
||||
event.set()
|
||||
|
||||
engine.submit_message('ghost turn', cancel_event=event)
|
||||
|
||||
assert engine.transcript_store.entries == [], (
|
||||
'cancelled turn must not appear in transcript_store'
|
||||
)
|
||||
|
||||
def test_pre_set_event_preserves_usage_counters(self) -> None:
|
||||
engine = _fresh_engine()
|
||||
initial_usage = engine.total_usage
|
||||
event = threading.Event()
|
||||
event.set()
|
||||
|
||||
engine.submit_message('expensive prompt ' * 100, cancel_event=event)
|
||||
|
||||
assert engine.total_usage == initial_usage, (
|
||||
'cancelled turn must not increment token counters'
|
||||
)
|
||||
|
||||
def test_pre_set_event_preserves_permission_denials(self) -> None:
|
||||
engine = _fresh_engine()
|
||||
event = threading.Event()
|
||||
event.set()
|
||||
|
||||
denials = (PermissionDenial(tool_name='BashTool', reason='destructive'),)
|
||||
engine.submit_message('run bash ls', denied_tools=denials, cancel_event=event)
|
||||
|
||||
assert engine.permission_denials == [], (
|
||||
'cancelled turn must not extend permission_denials'
|
||||
)
|
||||
|
||||
|
||||
class TestCancellationAfterBudgetCheck:
|
||||
"""Event set between budget projection and commit => 'cancelled', state intact.
|
||||
|
||||
This simulates the realistic racy case: engine starts computing output,
|
||||
caller hits deadline, sets event. Engine observes at post-budget checkpoint
|
||||
and returns cleanly.
|
||||
"""
|
||||
|
||||
def test_post_budget_cancel_returns_cancelled(self) -> None:
|
||||
engine = _fresh_engine()
|
||||
event = threading.Event()
|
||||
|
||||
# Patch: set the event after projection but before mutation. We do this
|
||||
# by wrapping _format_output (called mid-submit) to set the event.
|
||||
original_format = engine._format_output
|
||||
|
||||
def _set_then_format(*args, **kwargs):
|
||||
result = original_format(*args, **kwargs)
|
||||
event.set() # trigger cancel right after output is built
|
||||
return result
|
||||
|
||||
engine._format_output = _set_then_format # type: ignore[method-assign]
|
||||
|
||||
result = engine.submit_message('hello', cancel_event=event)
|
||||
|
||||
assert result.stop_reason == 'cancelled'
|
||||
# Output IS built here (we're past the pre-budget checkpoint), so it's
|
||||
# not empty. The contract is about *state*, not output synthesis.
|
||||
assert result.output != ''
|
||||
# Critical: state still unchanged
|
||||
assert engine.mutable_messages == []
|
||||
assert engine.transcript_store.entries == []
|
||||
|
||||
|
||||
class TestCancellationAfterCommit:
|
||||
"""Event set after commit is not observable \u2014 honest cooperative limit."""
|
||||
|
||||
def test_post_commit_cancel_is_not_observable(self) -> None:
|
||||
engine = _fresh_engine()
|
||||
event = threading.Event()
|
||||
|
||||
# Event only set *after* submit_message returns. The first call has
|
||||
# already committed before the event is set.
|
||||
result = engine.submit_message('hello', cancel_event=event)
|
||||
event.set() # too late
|
||||
|
||||
assert result.stop_reason == 'completed', (
|
||||
'cancel set after commit must not retroactively invalidate the turn'
|
||||
)
|
||||
assert engine.mutable_messages == ['hello']
|
||||
|
||||
def test_next_call_observes_cancel(self) -> None:
|
||||
"""The cancel_event persists \u2014 the next call on the same engine sees it."""
|
||||
engine = _fresh_engine()
|
||||
event = threading.Event()
|
||||
|
||||
engine.submit_message('first', cancel_event=event)
|
||||
assert engine.mutable_messages == ['first']
|
||||
|
||||
event.set()
|
||||
# Next call observes the cancel at entry
|
||||
result = engine.submit_message('second', cancel_event=event)
|
||||
|
||||
assert result.stop_reason == 'cancelled'
|
||||
# 'second' must NOT have been committed
|
||||
assert engine.mutable_messages == ['first']
|
||||
|
||||
|
||||
class TestLegacyCallersUnchanged:
|
||||
"""cancel_event=None (default) => zero behaviour change from pre-#164."""
|
||||
|
||||
def test_no_event_submits_normally(self) -> None:
|
||||
engine = _fresh_engine()
|
||||
result = engine.submit_message('hello')
|
||||
|
||||
assert result.stop_reason == 'completed'
|
||||
assert engine.mutable_messages == ['hello']
|
||||
|
||||
def test_no_event_with_budget_overflow_still_rejects_atomically(self) -> None:
|
||||
"""#162 atomicity contract survives when cancel_event is absent."""
|
||||
engine = _fresh_engine(max_budget_tokens=1)
|
||||
words = ' '.join(['word'] * 100)
|
||||
|
||||
result = engine.submit_message(words) # no cancel_event
|
||||
|
||||
assert result.stop_reason == 'max_budget_reached'
|
||||
assert engine.mutable_messages == []
|
||||
|
||||
def test_no_event_respects_max_turns(self) -> None:
|
||||
"""max_turns_reached contract survives when cancel_event is absent."""
|
||||
engine = _fresh_engine(max_turns=1)
|
||||
engine.submit_message('first')
|
||||
result = engine.submit_message('second') # no cancel_event
|
||||
|
||||
assert result.stop_reason == 'max_turns_reached'
|
||||
assert engine.mutable_messages == ['first']
|
||||
|
||||
|
||||
class TestCancellationVsOtherStopReasons:
|
||||
"""cancel_event has a defined precedence relative to budget/turns."""
|
||||
|
||||
def test_cancel_precedes_max_turns_check(self) -> None:
|
||||
"""If cancel is set when capacity is also full, cancel wins (clearer signal)."""
|
||||
engine = _fresh_engine(max_turns=0) # immediately full
|
||||
event = threading.Event()
|
||||
event.set()
|
||||
|
||||
result = engine.submit_message('hello', cancel_event=event)
|
||||
|
||||
# cancel_event check is the very first thing in submit_message,
|
||||
# so it fires before the max_turns check even sees capacity
|
||||
assert result.stop_reason == 'cancelled'
|
||||
|
||||
def test_cancel_does_not_override_commit(self) -> None:
|
||||
"""Completed turn with late cancel still reports 'completed' \u2014 the
|
||||
turn already succeeded; we don't lie about it."""
|
||||
engine = _fresh_engine()
|
||||
event = threading.Event()
|
||||
|
||||
# Event gets set after the mutation is done \u2014 submit_message doesn't
|
||||
# re-check after commit
|
||||
result = engine.submit_message('hello', cancel_event=event)
|
||||
event.set()
|
||||
|
||||
assert result.stop_reason == 'completed'
|
||||
Reference in New Issue
Block a user