Block oversized requests before providers hard-fail

The runtime already tracked rough token estimates for compaction, but provider-bound requests still relied on naive model output limits and could be sent upstream even when the selected model could not fit the estimated prompt plus requested output. This adds a small model token/context registry in the API layer, estimates request size from the serialized prompt payload, and fails locally with a dedicated context-window error before Anthropic or xAI calls are made. Focused integration coverage asserts the preflight fires before any HTTP request leaves the process. Constraint: Keep the first pass minimal and reusable across both Anthropic and OpenAI-compatible providers Rejected: Auto-compact-and-retry in the same patch | broader control-flow change than the requested minimal preflight Confidence: medium Scope-risk: narrow Reversibility: clean Directive: Expand the model registry before enabling preflight for additional providers or aliases Tested: cargo build -p api -p tools -p rusty-claude-cli; cargo test -p api Not-tested: End-to-end CLI auto-compaction or retry behavior after a local context_window_blocked failure
2026-04-08 00:54:49 +08:00 · 2026-04-05 16:39:58 +00:00
parent b9c5cc118e
commit fa72cd665e
6 changed files with 264 additions and 11 deletions
--- a/rust/crates/api/src/providers/openai_compat.rs
+++ b/rust/crates/api/src/providers/openai_compat.rs
@@ -12,7 +12,7 @@ use crate::types::{
    ToolChoice, ToolDefinition, ToolResultContentBlock, Usage,
 };

-use super::{Provider, ProviderFuture};
+use super::{preflight_message_request, Provider, ProviderFuture};

 pub const DEFAULT_XAI_BASE_URL: &str = "https://api.x.ai/v1";
 pub const DEFAULT_OPENAI_BASE_URL: &str = "https://api.openai.com/v1";
@@ -128,6 +128,7 @@ impl OpenAiCompatClient {
            stream: false,
            ..request.clone()
        };
+        preflight_message_request(&request)?;
        let response = self.send_with_retry(&request).await?;
        let request_id = request_id_from_headers(response.headers());
        let payload = response.json::<ChatCompletionResponse>().await?;
@@ -142,6 +143,7 @@ impl OpenAiCompatClient {
        &self,
        request: &MessageRequest,
    ) -> Result<MessageStream, ApiError> {
+        preflight_message_request(request)?;
        let response = self
            .send_with_retry(&request.clone().with_streaming())
            .await?;