Block oversized requests before providers hard-fail

The runtime already tracked rough token estimates for compaction, but provider-bound requests still relied on naive model output limits and could be sent upstream even when the selected model could not fit the estimated prompt plus requested output. This adds a small model token/context registry in the API layer, estimates request size from the serialized prompt payload, and fails locally with a dedicated context-window error before Anthropic or xAI calls are made. Focused integration coverage asserts the preflight fires before any HTTP request leaves the process. Constraint: Keep the first pass minimal and reusable across both Anthropic and OpenAI-compatible providers Rejected: Auto-compact-and-retry in the same patch | broader control-flow change than the requested minimal preflight Confidence: medium Scope-risk: narrow Reversibility: clean Directive: Expand the model registry before enabling preflight for additional providers or aliases Tested: cargo build -p api -p tools -p rusty-claude-cli; cargo test -p api Not-tested: End-to-end CLI auto-compaction or retry behavior after a local context_window_blocked failure
2026-06-11 00:42:15 +08:00 · 2026-04-05 16:39:58 +00:00
parent b9c5cc118e
commit fa72cd665e
6 changed files with 264 additions and 11 deletions
--- a/rust/crates/api/src/error.rs
+++ b/rust/crates/api/src/error.rs
@@ -8,6 +8,13 @@ pub enum ApiError {
        provider: &'static str,
        env_vars: &'static [&'static str],
    },
+    ContextWindowExceeded {
+        model: String,
+        estimated_input_tokens: u32,
+        requested_output_tokens: u32,
+        estimated_total_tokens: u32,
+        context_window_tokens: u32,
+    },
    ExpiredOAuthToken,
    Auth(String),
    InvalidApiKeyEnv(VarError),
@@ -48,6 +55,7 @@ impl ApiError {
            Self::Api { retryable, .. } => *retryable,
            Self::RetriesExhausted { last_error, .. } => last_error.is_retryable(),
            Self::MissingCredentials { .. }
+            | Self::ContextWindowExceeded { .. }
            | Self::ExpiredOAuthToken
            | Self::Auth(_)
            | Self::InvalidApiKeyEnv(_)
@@ -67,6 +75,16 @@ impl Display for ApiError {
                "missing {provider} credentials; export {} before calling the {provider} API",
                env_vars.join(" or ")
            ),
+            Self::ContextWindowExceeded {
+                model,
+                estimated_input_tokens,
+                requested_output_tokens,
+                estimated_total_tokens,
+                context_window_tokens,
+            } => write!(
+                f,
+                "context_window_blocked for {model}: estimated input {estimated_input_tokens} + requested output {requested_output_tokens} = {estimated_total_tokens} tokens exceeds the {context_window_tokens}-token context window; compact the session or reduce request size before retrying"
+            ),
            Self::ExpiredOAuthToken => {
                write!(
                    f,