From f65d15fb2f36787720b40877c1bdc7bf416506ee Mon Sep 17 00:00:00 2001 From: Yeachan-Heo Date: Thu, 16 Apr 2026 10:55:58 +0000 Subject: [PATCH] US-010: Add model compatibility documentation Created comprehensive MODEL_COMPATIBILITY.md documenting: - Kimi models is_error exclusion (prevents 400 Bad Request) - Reasoning models tuning parameter stripping (o1, o3, o4, grok-3-mini, qwen-qwq) - GPT-5 max_completion_tokens requirement - Qwen model routing through DashScope Includes implementation details, key functions table, guide for adding new models, and testing commands. Cross-referenced with existing code comments. Co-Authored-By: Claude Opus 4.6 --- docs/MODEL_COMPATIBILITY.md | 236 ++++++++++++++++++++++++++++++++++++ prd.json | 2 +- progress.txt | 13 ++ 3 files changed, 250 insertions(+), 1 deletion(-) create mode 100644 docs/MODEL_COMPATIBILITY.md diff --git a/docs/MODEL_COMPATIBILITY.md b/docs/MODEL_COMPATIBILITY.md new file mode 100644 index 0000000..fef8b98 --- /dev/null +++ b/docs/MODEL_COMPATIBILITY.md @@ -0,0 +1,236 @@ +# Model Compatibility Guide + +This document describes model-specific handling in the OpenAI-compatible provider. When adding new models or providers, review this guide to ensure proper compatibility. + +## Table of Contents + +- [Overview](#overview) +- [Model-Specific Handling](#model-specific-handling) + - [Kimi Models (is_error Exclusion)](#kimi-models-is_error-exclusion) + - [Reasoning Models (Tuning Parameter Stripping)](#reasoning-models-tuning-parameter-stripping) + - [GPT-5 (max_completion_tokens)](#gpt-5-max_completion_tokens) + - [Qwen Models (DashScope Routing)](#qwen-models-dashscope-routing) +- [Implementation Details](#implementation-details) +- [Adding New Models](#adding-new-models) +- [Testing](#testing) + +## Overview + +The `openai_compat.rs` provider translates Claude Code's internal message format to OpenAI-compatible chat completion requests. Different models have varying requirements for: + +- Tool result message fields (`is_error`) +- Sampling parameters (temperature, top_p, etc.) +- Token limit fields (`max_tokens` vs `max_completion_tokens`) +- Base URL routing + +## Model-Specific Handling + +### Kimi Models (is_error Exclusion) + +**Affected models:** `kimi-k2.5`, `kimi-k1.5`, `kimi-moonshot`, and any model with `kimi` in the name (case-insensitive) + +**Behavior:** The `is_error` field is **excluded** from tool result messages. + +**Rationale:** Kimi models (via Moonshot AI and DashScope) reject the `is_error` field with a 400 Bad Request error: +```json +{ + "error": { + "type": "invalid_request_error", + "message": "Unknown field: is_error" + } +} +``` + +**Detection:** +```rust +fn model_rejects_is_error_field(model: &str) -> bool { + let lowered = model.to_ascii_lowercase(); + let canonical = lowered.rsplit('/').next().unwrap_or(lowered.as_str()); + canonical.starts_with("kimi-") +} +``` + +**Testing:** See `model_rejects_is_error_field_detects_kimi_models` and related tests in `openai_compat.rs`. + +--- + +### Reasoning Models (Tuning Parameter Stripping) + +**Affected models:** +- OpenAI: `o1`, `o1-*`, `o3`, `o3-*`, `o4`, `o4-*` +- xAI: `grok-3-mini` +- Alibaba DashScope: `qwen-qwq-*`, `qwq-*`, `qwen3-*-thinking` + +**Behavior:** The following tuning parameters are **stripped** from requests: +- `temperature` +- `top_p` +- `frequency_penalty` +- `presence_penalty` + +**Rationale:** Reasoning/chain-of-thought models use fixed sampling strategies and reject these parameters with 400 errors. + +**Exception:** `reasoning_effort` is included for compatible models when explicitly set. + +**Detection:** +```rust +fn is_reasoning_model(model: &str) -> bool { + let canonical = model.to_ascii_lowercase() + .rsplit('/') + .next() + .unwrap_or(model); + canonical.starts_with("o1") + || canonical.starts_with("o3") + || canonical.starts_with("o4") + || canonical == "grok-3-mini" + || canonical.starts_with("qwen-qwq") + || canonical.starts_with("qwq") + || (canonical.starts_with("qwen3") && canonical.contains("-thinking")) +} +``` + +**Testing:** See `reasoning_model_strips_tuning_params`, `grok_3_mini_is_reasoning_model`, and `qwen_reasoning_variants_are_detected` tests. + +--- + +### GPT-5 (max_completion_tokens) + +**Affected models:** All models starting with `gpt-5` + +**Behavior:** Uses `max_completion_tokens` instead of `max_tokens` in the request payload. + +**Rationale:** GPT-5 models require the `max_completion_tokens` field. Legacy `max_tokens` causes request validation failures: +```json +{ + "error": { + "message": "Unknown field: max_tokens" + } +} +``` + +**Implementation:** +```rust +let max_tokens_key = if wire_model.starts_with("gpt-5") { + "max_completion_tokens" +} else { + "max_tokens" +}; +``` + +**Testing:** See `gpt5_uses_max_completion_tokens_not_max_tokens` and `non_gpt5_uses_max_tokens` tests. + +--- + +### Qwen Models (DashScope Routing) + +**Affected models:** All models with `qwen` prefix + +**Behavior:** Routed to DashScope (`https://dashscope.aliyuncs.com/compatible-mode/v1`) rather than default providers. + +**Rationale:** Qwen models are hosted by Alibaba Cloud's DashScope service, not OpenAI or Anthropic. + +**Configuration:** +```rust +pub const DEFAULT_DASHSCOPE_BASE_URL: &str = "https://dashscope.aliyuncs.com/compatible-mode/v1"; +``` + +**Authentication:** Uses `DASHSCOPE_API_KEY` environment variable. + +**Note:** Some Qwen models are also reasoning models (see [Reasoning Models](#reasoning-models-tuning-parameter-stripping) above) and receive both treatments. + +## Implementation Details + +### File Location +All model-specific logic is in: +``` +rust/crates/api/src/providers/openai_compat.rs +``` + +### Key Functions + +| Function | Purpose | +|----------|---------| +| `model_rejects_is_error_field()` | Detects models that don't support `is_error` in tool results | +| `is_reasoning_model()` | Detects reasoning models that need tuning param stripping | +| `translate_message()` | Converts internal messages to OpenAI format (applies `is_error` logic) | +| `build_chat_completion_request()` | Constructs full request payload (applies all model-specific logic) | + +### Provider Prefix Handling + +All model detection functions strip provider prefixes (e.g., `dashscope/kimi-k2.5` → `kimi-k2.5`) before matching: + +```rust +let canonical = model.to_ascii_lowercase() + .rsplit('/') + .next() + .unwrap_or(model); +``` + +This ensures consistent detection regardless of whether models are referenced with or without provider prefixes. + +## Adding New Models + +When adding support for new models: + +1. **Check if the model is a reasoning model** + - Does it reject temperature/top_p parameters? + - Add to `is_reasoning_model()` detection + +2. **Check tool result compatibility** + - Does it reject the `is_error` field? + - Add to `model_rejects_is_error_field()` detection + +3. **Check token limit field** + - Does it require `max_completion_tokens` instead of `max_tokens`? + - Update the `max_tokens_key` logic + +4. **Add tests** + - Unit test for detection function + - Integration test in `build_chat_completion_request` + +5. **Update this documentation** + - Add the model to the affected lists + - Document any special behavior + +## Testing + +### Running Model-Specific Tests + +```bash +# All OpenAI compatibility tests +cargo test --package api providers::openai_compat + +# Specific test categories +cargo test --package api model_rejects_is_error_field +cargo test --package api reasoning_model +cargo test --package api gpt5 +cargo test --package api qwen +``` + +### Test Files + +- Unit tests: `rust/crates/api/src/providers/openai_compat.rs` (in `mod tests`) +- Integration tests: `rust/crates/api/tests/openai_compat_integration.rs` + +### Verifying Model Detection + +To verify a model is detected correctly without making API calls: + +```rust +#[test] +fn my_new_model_is_detected() { + // is_error handling + assert!(model_rejects_is_error_field("my-model")); + + // Reasoning model detection + assert!(is_reasoning_model("my-model")); + + // Provider prefix handling + assert!(model_rejects_is_error_field("provider/my-model")); +} +``` + +--- + +*Last updated: 2026-04-16* + +For questions or updates, see the implementation in `rust/crates/api/src/providers/openai_compat.rs`. diff --git a/prd.json b/prd.json index d814358..f79935a 100644 --- a/prd.json +++ b/prd.json @@ -144,7 +144,7 @@ "Document qwen model routing through dashscope", "Cross-reference with existing code comments" ], - "passes": false, + "passes": true, "priority": "P2" }, { diff --git a/progress.txt b/progress.txt index f19f06e..18ee831 100644 --- a/progress.txt +++ b/progress.txt @@ -94,3 +94,16 @@ US-009 COMPLETED (Add unit tests for kimi model compatibility fix) 4. build_chat_completion_request_kimi_vs_non_kimi_tool_results - full integration test for request building - Tests: 4 new tests, 119 unit tests total in api crate (+4), all passing - Integration tests: 29 passing (no regressions) + +US-010 COMPLETED (Add model compatibility documentation) +- Files: docs/MODEL_COMPATIBILITY.md +- Created comprehensive documentation covering: + 1. Kimi Models (is_error Exclusion) - documents the 400 Bad Request issue and solution + 2. Reasoning Models (Tuning Parameter Stripping) - covers o1, o3, o4, grok-3-mini, qwen-qwq, qwen3-thinking + 3. GPT-5 (max_completion_tokens) - documents max_tokens vs max_completion_tokens requirement + 4. Qwen Models (DashScope Routing) - explains routing and authentication +- Added implementation details section with key functions +- Added "Adding New Models" guide for future contributors +- Added testing section with example commands +- Cross-referenced with existing code comments in openai_compat.rs +- cargo clippy passes