mirror of
https://github.com/instructkr/claw-code.git
synced 2026-04-17 18:55:13 +08:00
Added criterion benchmarks and optimized flatten_tool_result_content: - Added criterion dev-dependency and request_building benchmark suite - Optimized flatten_tool_result_content to pre-allocate capacity and avoid intermediate Vec construction (was collecting to Vec then joining) - Made key functions public for benchmarking: translate_message, build_chat_completion_request, flatten_tool_result_content, is_reasoning_model, model_rejects_is_error_field Benchmark results: - flatten_tool_result_content/single_text: ~17ns - translate_message/text_only: ~200ns - build_chat_completion_request/10 messages: ~16.4µs - is_reasoning_model detection: ~26-42ns All 119 unit tests and 29 integration tests pass. cargo clippy passes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
134 lines
6.1 KiB
Plaintext
134 lines
6.1 KiB
Plaintext
Ralph Iteration Summary - claw-code Roadmap Implementation
|
|
===========================================================
|
|
|
|
Iteration 1: 2026-04-16
|
|
------------------------
|
|
|
|
US-001 COMPLETED (Phase 1.6 - startup-no-evidence evidence bundle + classifier)
|
|
- Files: rust/crates/runtime/src/worker_boot.rs
|
|
- Added StartupFailureClassification enum with 6 variants
|
|
- Added StartupEvidenceBundle with 8 fields
|
|
- Implemented classify_startup_failure() logic
|
|
- Added observe_startup_timeout() method to Worker
|
|
- Tests: 6 new tests verifying classification logic
|
|
|
|
US-002 COMPLETED (Phase 2 - Canonical lane event schema)
|
|
- Files: rust/crates/runtime/src/lane_events.rs
|
|
- Added EventProvenance enum with 5 labels
|
|
- Added SessionIdentity, LaneOwnership structs
|
|
- Added LaneEventMetadata with sequence/ordering
|
|
- Added LaneEventBuilder for construction
|
|
- Implemented is_terminal_event(), dedupe_terminal_events()
|
|
- Tests: 10 new tests for events and deduplication
|
|
|
|
US-005 COMPLETED (Phase 4 - Typed task packet format)
|
|
- Files:
|
|
- rust/crates/runtime/src/task_packet.rs
|
|
- rust/crates/runtime/src/task_registry.rs
|
|
- rust/crates/tools/src/lib.rs
|
|
- Added TaskScope enum (Workspace, Module, SingleFile, Custom)
|
|
- Updated TaskPacket with scope_path and worktree fields
|
|
- Added validate_scope_requirements() validation logic
|
|
- Fixed all test compilation errors in dependent modules
|
|
- Tests: Updated existing tests to use new types
|
|
|
|
PRE-EXISTING IMPLEMENTATIONS (verified working):
|
|
------------------------------------------------
|
|
|
|
US-003 COMPLETE (Phase 3 - Stale-branch detection)
|
|
- Files: rust/crates/runtime/src/stale_branch.rs
|
|
- BranchFreshness enum (Fresh, Stale, Diverged)
|
|
- StaleBranchPolicy (AutoRebase, AutoMergeForward, WarnOnly, Block)
|
|
- StaleBranchEvent with structured events
|
|
- check_freshness() with git integration
|
|
- apply_policy() with policy resolution
|
|
- Tests: 12 unit tests + 5 integration tests passing
|
|
|
|
US-004 COMPLETE (Phase 3 - Recovery recipes with ledger)
|
|
- Files: rust/crates/runtime/src/recovery_recipes.rs
|
|
- FailureScenario enum with 7 scenarios
|
|
- RecoveryStep enum with actionable steps
|
|
- RecoveryRecipe with step sequences
|
|
- RecoveryLedger for attempt tracking
|
|
- RecoveryEvent for structured emission
|
|
- attempt_recovery() with escalation logic
|
|
- Tests: 15 unit tests + 1 integration test passing
|
|
|
|
US-006 COMPLETE (Phase 4 - Policy engine for autonomous coding)
|
|
- Files: rust/crates/runtime/src/policy_engine.rs
|
|
- PolicyRule with condition/action/priority
|
|
- PolicyCondition (And, Or, GreenAt, StaleBranch, etc.)
|
|
- PolicyAction (MergeToDev, RecoverOnce, Escalate, etc.)
|
|
- LaneContext for evaluation context
|
|
- evaluate() for rule matching
|
|
- Tests: 18 unit tests + 6 integration tests passing
|
|
|
|
US-007 COMPLETE (Phase 5 - Plugin/MCP lifecycle maturity)
|
|
- Files: rust/crates/runtime/src/plugin_lifecycle.rs
|
|
- ServerStatus enum (Healthy, Degraded, Failed)
|
|
- ServerHealth with capabilities tracking
|
|
- PluginState with full lifecycle states
|
|
- PluginLifecycle event tracking
|
|
- PluginHealthcheck structured results
|
|
- DiscoveryResult for capability discovery
|
|
- DegradedMode behavior
|
|
- Tests: 11 unit tests passing
|
|
|
|
VERIFICATION STATUS:
|
|
------------------
|
|
- cargo build --workspace: PASSED
|
|
- cargo test --workspace: PASSED (476+ unit tests, 12 integration tests)
|
|
- cargo clippy --workspace: PASSED
|
|
|
|
All 7 stories from prd.json now have passes: true
|
|
|
|
Iteration 2: 2026-04-16
|
|
------------------------
|
|
|
|
US-009 COMPLETED (Add unit tests for kimi model compatibility fix)
|
|
- Files: rust/crates/api/src/providers/openai_compat.rs
|
|
- Added 4 comprehensive unit tests:
|
|
1. model_rejects_is_error_field_detects_kimi_models - verifies detection of kimi-k2.5, kimi-k1.5, dashscope/kimi-k2.5, case insensitivity
|
|
2. translate_message_includes_is_error_for_non_kimi_models - verifies gpt-4o, grok-3, claude include is_error
|
|
3. translate_message_excludes_is_error_for_kimi_models - verifies kimi models exclude is_error (prevents 400 Bad Request)
|
|
4. build_chat_completion_request_kimi_vs_non_kimi_tool_results - full integration test for request building
|
|
- Tests: 4 new tests, 119 unit tests total in api crate (+4), all passing
|
|
- Integration tests: 29 passing (no regressions)
|
|
|
|
US-010 COMPLETED (Add model compatibility documentation)
|
|
- Files: docs/MODEL_COMPATIBILITY.md
|
|
- Created comprehensive documentation covering:
|
|
1. Kimi Models (is_error Exclusion) - documents the 400 Bad Request issue and solution
|
|
2. Reasoning Models (Tuning Parameter Stripping) - covers o1, o3, o4, grok-3-mini, qwen-qwq, qwen3-thinking
|
|
3. GPT-5 (max_completion_tokens) - documents max_tokens vs max_completion_tokens requirement
|
|
4. Qwen Models (DashScope Routing) - explains routing and authentication
|
|
- Added implementation details section with key functions
|
|
- Added "Adding New Models" guide for future contributors
|
|
- Added testing section with example commands
|
|
- Cross-referenced with existing code comments in openai_compat.rs
|
|
- cargo clippy passes
|
|
|
|
US-011 COMPLETED (Performance optimization: reduce API request serialization overhead)
|
|
- Files:
|
|
- rust/crates/api/Cargo.toml (added criterion dev-dependency and bench config)
|
|
- rust/crates/api/benches/request_building.rs (new benchmark suite)
|
|
- rust/crates/api/src/providers/openai_compat.rs (optimizations)
|
|
- rust/crates/api/src/lib.rs (public exports for benchmarks)
|
|
- Optimizations implemented:
|
|
1. flatten_tool_result_content: Pre-allocate String capacity and avoid intermediate Vec
|
|
- Before: collected to Vec<String> then joined
|
|
- After: single String with pre-calculated capacity, push directly
|
|
2. Made key functions public for benchmarking: translate_message, build_chat_completion_request,
|
|
flatten_tool_result_content, is_reasoning_model, model_rejects_is_error_field
|
|
- Benchmark results:
|
|
- flatten_tool_result_content/single_text: ~17ns
|
|
- flatten_tool_result_content/multi_text (10 blocks): ~46ns
|
|
- flatten_tool_result_content/large_content (50 blocks): ~11.7µs
|
|
- translate_message/text_only: ~200ns
|
|
- translate_message/tool_result: ~348ns
|
|
- build_chat_completion_request/10 messages: ~16.4µs
|
|
- build_chat_completion_request/100 messages: ~209µs
|
|
- is_reasoning_model detection: ~26-42ns depending on model
|
|
- All tests pass (119 unit tests + 29 integration tests)
|
|
- cargo clippy passes
|