US-010: Add model compatibility documentation

Created comprehensive MODEL_COMPATIBILITY.md documenting: - Kimi models is_error exclusion (prevents 400 Bad Request) - Reasoning models tuning parameter stripping (o1, o3, o4, grok-3-mini, qwen-qwq) - GPT-5 max_completion_tokens requirement - Qwen model routing through DashScope Includes implementation details, key functions table, guide for adding new models, and testing commands. Cross-referenced with existing code comments. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-17 21:55:11 +08:00 · 2026-04-16 10:55:58 +00:00
parent 3e4e1585b5
commit f65d15fb2f
3 changed files with 250 additions and 1 deletions
--- a/docs/MODEL_COMPATIBILITY.md
+++ b/docs/MODEL_COMPATIBILITY.md
@@ -0,0 +1,236 @@
+# Model Compatibility Guide
+
+This document describes model-specific handling in the OpenAI-compatible provider. When adding new models or providers, review this guide to ensure proper compatibility.
+
+## Table of Contents
+
+- [Overview](#overview)
+- [Model-Specific Handling](#model-specific-handling)
+  - [Kimi Models (is_error Exclusion)](#kimi-models-is_error-exclusion)
+  - [Reasoning Models (Tuning Parameter Stripping)](#reasoning-models-tuning-parameter-stripping)
+  - [GPT-5 (max_completion_tokens)](#gpt-5-max_completion_tokens)
+  - [Qwen Models (DashScope Routing)](#qwen-models-dashscope-routing)
+- [Implementation Details](#implementation-details)
+- [Adding New Models](#adding-new-models)
+- [Testing](#testing)
+
+## Overview
+
+The `openai_compat.rs` provider translates Claude Code's internal message format to OpenAI-compatible chat completion requests. Different models have varying requirements for:
+
+- Tool result message fields (`is_error`)
+- Sampling parameters (temperature, top_p, etc.)
+- Token limit fields (`max_tokens` vs `max_completion_tokens`)
+- Base URL routing
+
+## Model-Specific Handling
+
+### Kimi Models (is_error Exclusion)
+
+**Affected models:** `kimi-k2.5`, `kimi-k1.5`, `kimi-moonshot`, and any model with `kimi` in the name (case-insensitive)
+
+**Behavior:** The `is_error` field is **excluded** from tool result messages.
+
+**Rationale:** Kimi models (via Moonshot AI and DashScope) reject the `is_error` field with a 400 Bad Request error:
+```json
+{
+  "error": {
+    "type": "invalid_request_error",
+    "message": "Unknown field: is_error"
+  }
+}
+```
+
+**Detection:**
+```rust
+fn model_rejects_is_error_field(model: &str) -> bool {
+    let lowered = model.to_ascii_lowercase();
+    let canonical = lowered.rsplit('/').next().unwrap_or(lowered.as_str());
+    canonical.starts_with("kimi-")
+}
+```
+
+**Testing:** See `model_rejects_is_error_field_detects_kimi_models` and related tests in `openai_compat.rs`.
+
+---
+
+### Reasoning Models (Tuning Parameter Stripping)
+
+**Affected models:**
+- OpenAI: `o1`, `o1-*`, `o3`, `o3-*`, `o4`, `o4-*`
+- xAI: `grok-3-mini`
+- Alibaba DashScope: `qwen-qwq-*`, `qwq-*`, `qwen3-*-thinking`
+
+**Behavior:** The following tuning parameters are **stripped** from requests:
+- `temperature`
+- `top_p`
+- `frequency_penalty`
+- `presence_penalty`
+
+**Rationale:** Reasoning/chain-of-thought models use fixed sampling strategies and reject these parameters with 400 errors.
+
+**Exception:** `reasoning_effort` is included for compatible models when explicitly set.
+
+**Detection:**
+```rust
+fn is_reasoning_model(model: &str) -> bool {
+    let canonical = model.to_ascii_lowercase()
+        .rsplit('/')
+        .next()
+        .unwrap_or(model);
+    canonical.starts_with("o1")
+        || canonical.starts_with("o3")
+        || canonical.starts_with("o4")
+        || canonical == "grok-3-mini"
+        || canonical.starts_with("qwen-qwq")
+        || canonical.starts_with("qwq")
+        || (canonical.starts_with("qwen3") && canonical.contains("-thinking"))
+}
+```
+
+**Testing:** See `reasoning_model_strips_tuning_params`, `grok_3_mini_is_reasoning_model`, and `qwen_reasoning_variants_are_detected` tests.
+
+---
+
+### GPT-5 (max_completion_tokens)
+
+**Affected models:** All models starting with `gpt-5`
+
+**Behavior:** Uses `max_completion_tokens` instead of `max_tokens` in the request payload.
+
+**Rationale:** GPT-5 models require the `max_completion_tokens` field. Legacy `max_tokens` causes request validation failures:
+```json
+{
+  "error": {
+    "message": "Unknown field: max_tokens"
+  }
+}
+```
+
+**Implementation:**
+```rust
+let max_tokens_key = if wire_model.starts_with("gpt-5") {
+    "max_completion_tokens"
+} else {
+    "max_tokens"
+};
+```
+
+**Testing:** See `gpt5_uses_max_completion_tokens_not_max_tokens` and `non_gpt5_uses_max_tokens` tests.
+
+---
+
+### Qwen Models (DashScope Routing)
+
+**Affected models:** All models with `qwen` prefix
+
+**Behavior:** Routed to DashScope (`https://dashscope.aliyuncs.com/compatible-mode/v1`) rather than default providers.
+
+**Rationale:** Qwen models are hosted by Alibaba Cloud's DashScope service, not OpenAI or Anthropic.
+
+**Configuration:**
+```rust
+pub const DEFAULT_DASHSCOPE_BASE_URL: &str = "https://dashscope.aliyuncs.com/compatible-mode/v1";
+```
+
+**Authentication:** Uses `DASHSCOPE_API_KEY` environment variable.
+
+**Note:** Some Qwen models are also reasoning models (see [Reasoning Models](#reasoning-models-tuning-parameter-stripping) above) and receive both treatments.
+
+## Implementation Details
+
+### File Location
+All model-specific logic is in:
+```
+rust/crates/api/src/providers/openai_compat.rs
+```
+
+### Key Functions
+
+| Function | Purpose |
+|----------|---------|
+| `model_rejects_is_error_field()` | Detects models that don't support `is_error` in tool results |
+| `is_reasoning_model()` | Detects reasoning models that need tuning param stripping |
+| `translate_message()` | Converts internal messages to OpenAI format (applies `is_error` logic) |
+| `build_chat_completion_request()` | Constructs full request payload (applies all model-specific logic) |
+
+### Provider Prefix Handling
+
+All model detection functions strip provider prefixes (e.g., `dashscope/kimi-k2.5` → `kimi-k2.5`) before matching:
+
+```rust
+let canonical = model.to_ascii_lowercase()
+    .rsplit('/')
+    .next()
+    .unwrap_or(model);
+```
+
+This ensures consistent detection regardless of whether models are referenced with or without provider prefixes.
+
+## Adding New Models
+
+When adding support for new models:
+
+1. **Check if the model is a reasoning model**
+   - Does it reject temperature/top_p parameters?
+   - Add to `is_reasoning_model()` detection
+
+2. **Check tool result compatibility**
+   - Does it reject the `is_error` field?
+   - Add to `model_rejects_is_error_field()` detection
+
+3. **Check token limit field**
+   - Does it require `max_completion_tokens` instead of `max_tokens`?
+   - Update the `max_tokens_key` logic
+
+4. **Add tests**
+   - Unit test for detection function
+   - Integration test in `build_chat_completion_request`
+
+5. **Update this documentation**
+   - Add the model to the affected lists
+   - Document any special behavior
+
+## Testing
+
+### Running Model-Specific Tests
+
+```bash
+# All OpenAI compatibility tests
+cargo test --package api providers::openai_compat
+
+# Specific test categories
+cargo test --package api model_rejects_is_error_field
+cargo test --package api reasoning_model
+cargo test --package api gpt5
+cargo test --package api qwen
+```
+
+### Test Files
+
+- Unit tests: `rust/crates/api/src/providers/openai_compat.rs` (in `mod tests`)
+- Integration tests: `rust/crates/api/tests/openai_compat_integration.rs`
+
+### Verifying Model Detection
+
+To verify a model is detected correctly without making API calls:
+
+```rust
+#[test]
+fn my_new_model_is_detected() {
+    // is_error handling
+    assert!(model_rejects_is_error_field("my-model"));
+    
+    // Reasoning model detection
+    assert!(is_reasoning_model("my-model"));
+    
+    // Provider prefix handling
+    assert!(model_rejects_is_error_field("provider/my-model"));
+}
+```
+
+---
+
+*Last updated: 2026-04-16*
+
+For questions or updates, see the implementation in `rust/crates/api/src/providers/openai_compat.rs`.