fix(stream): normalize delta.reasoning alias and separate reasoning in client response (#771 )

* fix(stream): normalize delta.reasoning to reasoning_content in SSE streaming NVIDIA kimi-k2.5 (and potentially other providers) send reasoning tokens as `delta.reasoning` in SSE streaming chunks instead of the standard OpenAI `delta.reasoning_content` field. This caused reasoning content to be silently dropped during stream passthrough — clients received only the final answer with no reasoning separation. The non-streaming sanitizer (responseSanitizer.ts) already handled this alias, but the streaming pipeline did not. Fix applied in 4 locations: - stream.ts passthrough: normalize + force re-serialize sanitized chunk - stream.ts translate: accumulate reasoning from delta.reasoning - sseParser.ts: collect delta.reasoning in parseSSEToOpenAIResponse - streamPayloadCollector.ts: collect delta.reasoning in buildOpenAISummary * fix: eliminate injectedUsage reuse bug and add reasoning alias tests - Detect delta.reasoning alias before sanitizeStreamingChunk() which already normalizes it, removing dead post-sanitization normalization - Replace injectedUsage reuse with separate needsReserialization flag so reasoning re-serialization cannot block finish_reason/usage mutations on the same SSE chunk (fixes CRITICAL review finding) - Add unit test for parseSSEToOpenAIResponse reasoning alias - Add unit test for buildStreamSummaryFromEvents reasoning alias * fix(stream): separate reasoning from content in passthrough response body The passthroughAccumulatedContent variable was mixing delta.content and delta.reasoning_content into one string, causing the client_response log and responseBody to lose reasoning separation. - Add passthroughAccumulatedReasoning accumulator for reasoning deltas - Set message.reasoning_content in responseBody when reasoning exists - Only accumulate delta.content into passthroughAccumulatedContent * fix: trim leading whitespace from assembled content in log summaries NVIDIA and other providers emit token deltas with leading spaces (e.g. ' The', ' user'). When joined, these produce a leading space in the provider_response and parsed non-streaming response logs. Trim the joined content and reasoning_content in both buildOpenAISummary and parseSSEToOpenAIResponse for consistent log output. * fix(stream): split combined reasoning+content deltas into separate SSE events Some providers (e.g. NVIDIA NIM) send transition chunks with both `delta.reasoning` and `delta.content` in the same SSE event. After sanitization this becomes `reasoning_content` + `content`, which violates the standard OpenAI streaming contract where these fields are never mixed. Clients using if/else logic (LobeChat, etc.) skip content when reasoning_content is present, losing the first content token. Split such combined chunks into two separate SSE events: 1. Reasoning-only event (finish_reason=null, no usage) 2. Content-only event (carries finish_reason and usage)
fix: strip reasoning/thinking params for models that don't support them (#766 )
2026-03-29 16:12:22 -03:00 · 2026-03-29 16:12:19 -03:00 · 2026-03-29 16:12:17 -03:00 · 2026-03-29 14:30:59 -03:00 · 2026-03-29 14:22:25 -03:00 · 2026-03-29 14:16:37 -03:00
63 changed files with 4107 additions and 268 deletions
@@ -0,0 +1,43 @@
+name: Sync Upstream
+
+on:
+  schedule:
+    # Run every 6 hours
+    - cron: '0 */6 * * *'
+  workflow_dispatch:
+
+permissions:
+  contents: write
+
+jobs:
+  sync:
+    name: Sync with upstream
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+          token: ${{ secrets.GITHUB_TOKEN }}
+
+      - name: Configure Git
+        run: |
+          git config user.name "github-actions[bot]"
+          git config user.email "github-actions[bot]@users.noreply.github.com"
+
+      - name: Fetch upstream
+        run: |
+          git remote add upstream https://github.com/diegosouzapw/OmniRoute.git || true
+          git fetch upstream
+          git fetch origin
+
+      - name: Sync main branch
+        run: |
+          git checkout main
+          git merge upstream/main --no-edit || {
+            echo "Merge conflict detected. Manual intervention required."
+            exit 1
+          }
+
+      - name: Push changes
+        run: git push https://x-access-token:${{ secrets.GITHUB_TOKEN }}@github.com/tombii/OmniRoute.git main
@@ -2,6 +2,51 @@

 ## [Unreleased]

+---
+
+## [3.3.0] - 2026-03-29
+
+### ✨ Enhancements & Refactoring
+
+- **Release Stabilization** — Finalized v3.2.9 release (combo diagnostics, quality gates, Gemini tool fix) and created missing git tag. Consolidated all staged changes into a single atomic release commit.
+
+### 🐛 Bug Fixes
+
+- **Auto-Update Test** — Fixed `buildDockerComposeUpdateScript` test assertion to match unexpanded shell variable references (`$TARGET_TAG`, `${TARGET_TAG#v}`) in the generated deploy script, aligning with the refactored template from v3.2.8.
+- **Circuit Breaker Test** — Hardened `combo-circuit-breaker.test.mjs` by injecting `maxRetries: 0` to prevent retry inflation from skewing failure count assertions during breaker state transitions.
+
+---
+
+## [3.2.9] - 2026-03-29
+
+### ✨ Enhancements & Refactoring
+
+- **Combo Diagnostics** — Introduced a live test bypass flag (`forceLiveComboTest`) allowing administrators to execute real upstream health checks that bypass all local circuit-breaker and cooldown state mechanisms, enabling precise diagnostics during rolling outages (PR #759)
+- **Quality Gates** — Added automated response quality validation for combos and officially integrated `claude-4.6` model support into the core routing schemas (PR #762)
+
+### 🐛 Bug Fixes
+
+- **Tool Definition Validation** — Repaired Gemini API integration by normalizing enum types inside tool definitions, preventing upstream HTTP 400 parameter errors (PR #760)
+
+---
+
+## [3.2.8] - 2026-03-29
+
+### ✨ Enhancements & Refactoring
+
+- **Docker Auto-Update UI** — Integrated a detached background update process for Docker Compose deployments. The Dashboard UI now seamlessly tracks update lifecycle events combining JSON REST responses with SSE streaming progress overlays for robust cross-environment reliability.
+- **Cache Analytics** — Repaired zero-metrics visualization mapping by migrating Semantic Cache telemetry logs directly into the centralized tracking SQLite module.
+
+### 🐛 Bug Fixes
+
+- **Authentication Logic** — Fixed a bug where saving dashboard settings or adding models failed with a 401 Unauthorized error when `requireLogin` was disabled. API endpoints now correctly evaluate the global authentication toggle. Resolved global redirection by reactivating `src/middleware.ts`.
+- **CLI Tool Detection (Windows)** — Prevented fatal initialization exceptions during CLI environment detection by catching `cross-spawn` ENOENT errors correctly. Adds explicit detection paths for `\AppData\Local\droid\droid.exe`.
+- **Codex Native Passthrough** — Normalized model translation parameters preventing context poisoning in proxy pass-through mode, enforcing generic `store: false` constraints explicitly for all Codex-originated requests.
+- **SSE Token Reporting** — Normalized provider tool-call chunk `finish_reason` detection, fixing 0% Usage analytics for stream-only responses missing strict `<DONE>` indicators.
+- **DeepSeek <think> Tags** — Implemented an explicit `<think>` extraction mapping inside `responsesHandler.ts`, ensuring DeepSeek reasoning streams map equivalently to native Anthropic `<thinking>` structures.
+
+---
+
 ## [3.2.7] - 2026-03-29

 ### Fixed
@@ -60,7 +60,7 @@ FROM runner-base AS runner-cli

 # Install system dependencies required by openclaw (git+ssh references).
 RUN apt-get update \
-  && apt-get install -y --no-install-recommends git ca-certificates \
+  && apt-get install -y --no-install-recommends git ca-certificates docker.io docker-compose \
  && rm -rf /var/lib/apt/lists/* \
  && git config --system url."https://github.com/".insteadOf "ssh://git@github.com/"

@@ -59,6 +59,11 @@ services:
    ports:
      - "${DASHBOARD_PORT:-${PORT:-20128}}:${DASHBOARD_PORT:-${PORT:-20128}}"
      - "${API_PORT:-20129}:${API_PORT:-20129}"
+    volumes:
+      - omniroute-data:/app/data
+      - /var/run/docker.sock:/var/run/docker.sock
+      - /usr/libexec/docker/cli-plugins:/usr/libexec/docker/cli-plugins:ro
+      - ${AUTO_UPDATE_HOST_REPO_DIR:-.}:/workspace/omniroute:rw
    profiles:
      - cli

@@ -43,7 +43,7 @@ See [IDE Configs](integrations/ide-configs.md) for Antigravity, Cursor, Copilot,
 | `omniroute_simulate_route`         | Dry-run routing simulation with fallback tree   |
 | `omniroute_set_budget_guard`       | Session budget with degrade/block/alert actions |
 | `omniroute_set_resilience_profile` | Apply conservative/balanced/aggressive preset   |
-| `omniroute_test_combo`             | Live-test all models in a combo                 |
+| `omniroute_test_combo`             | Live-test all models in a combo via a real upstream request |
 | `omniroute_get_provider_metrics`   | Detailed metrics for one provider               |
 | `omniroute_best_combo_for_task`    | Task-fitness recommendation with alternatives   |
 | `omniroute_explain_route`          | Explain a past routing decision                 |
@@ -1,7 +1,7 @@
 openapi: 3.1.0
 info:
  title: OmniRoute API
-  version: 3.2.7
+  version: 3.3.0
  description: |
    OmniRoute is a local-first AI API proxy router. It provides an OpenAI-compatible
    endpoint that routes requests to multiple AI providers with load balancing,
@@ -500,6 +500,12 @@ export const REGISTRY: Record<string, RegistryEntry> = {
    clientVersion: "1.1.3",
    models: [
      { id: "default", name: "Auto (Server Picks)" },
+      { id: "claude-4.6-opus-high-thinking", name: "Claude 4.6 Opus High Thinking" },
+      { id: "claude-4.6-opus-high", name: "Claude 4.6 Opus High" },
+      { id: "claude-4.6-sonnet-high-thinking", name: "Claude 4.6 Sonnet High Thinking" },
+      { id: "claude-4.6-sonnet-high", name: "Claude 4.6 Sonnet High" },
+      { id: "claude-4.6-haiku", name: "Claude 4.6 Haiku" },
+      { id: "claude-4.6-opus", name: "Claude 4.6 Opus" },
      { id: "claude-4.5-opus-high-thinking", name: "Claude 4.5 Opus High Thinking" },
      { id: "claude-4.5-opus-high", name: "Claude 4.5 Opus High" },
      { id: "claude-4.5-sonnet-thinking", name: "Claude 4.5 Sonnet Thinking" },
@@ -260,11 +260,9 @@ export class CodexExecutor extends BaseExecutor {
      body.service_tier = CODEX_FAST_WIRE_VALUE;
    }

-    if (nativeCodexPassthrough) {
-      return body;
-    }
-
    // If no instructions provided, inject default Codex instructions
+    // NOTE: must run before the passthrough return — Codex upstream rejects
+    // requests without instructions even when the body is forwarded as-is.
    if (!body.instructions || body.instructions.trim() === "") {
      body.instructions = CODEX_DEFAULT_INSTRUCTIONS;
    }
@@ -272,6 +270,10 @@ export class CodexExecutor extends BaseExecutor {
    // Ensure store is false (Codex requirement)
    body.store = false;

+    if (nativeCodexPassthrough) {
+      return body;
+    }
+
    // Extract thinking level from model name suffix
    // e.g., gpt-5.3-codex-high → high, gpt-5.3-codex → medium (default)
    const effortLevels = ["none", "low", "medium", "high", "xhigh"];
@@ -42,6 +42,9 @@ import {
  getModelUpstreamExtraHeaders,
 } from "@/lib/localDb";
 import { getExecutor } from "../executors/index.ts";
+import { getCacheControlSettings } from "@/lib/cacheControlSettings";
+import { shouldPreserveCacheControl } from "../utils/cacheControlPolicy.ts";
+import { getCacheMetrics } from "@/lib/db/settings.ts";

 import {
  parseCodexQuotaHeaders,
@@ -306,6 +309,11 @@ function attachLogMeta(
 * @param {function} options.onDisconnect - Callback when client disconnects
 * @param {string} options.connectionId - Connection ID for usage tracking
 * @param {object} options.apiKeyInfo - API key metadata for usage attribution
+ * @param {string} options.userAgent - Client user agent for caching decisions
+ * @param {string} options.comboName - Combo name if this is a combo request
+ * @param {string} options.comboStrategy - Combo routing strategy (e.g., 'priority', 'cost-optimized')
+ * @param {boolean} options.isCombo - Whether this request is from a combo
+ * @param {string} options.connectionId - Connection ID for settings lookup
 */
 export async function handleChatCore({
  body,
@@ -320,6 +328,8 @@ export async function handleChatCore({
  apiKeyInfo = null,
  userAgent,
  comboName,
+  comboStrategy = null,
+  isCombo = false,
 }) {
  let { provider, model, extendedContext } = modelInfo;
  const requestedModel =
@@ -674,6 +684,25 @@ export async function handleChatCore({
  // Translate request (pass reqLogger for intermediate logging)
  let translatedBody = body;
  const isClaudePassthrough = sourceFormat === FORMATS.CLAUDE && targetFormat === FORMATS.CLAUDE;
+
+  // Determine if we should preserve client-side cache_control headers
+  // Fetch settings from DB to get user preference
+  const cacheControlMode = await getCacheControlSettings().catch(() => "auto" as const);
+  const preserveCacheControl = shouldPreserveCacheControl({
+    userAgent,
+    isCombo,
+    comboStrategy,
+    targetProvider: provider,
+    settings: { alwaysPreserveClientCache: cacheControlMode },
+  });
+
+  if (preserveCacheControl) {
+    log?.debug?.(
+      "CACHE",
+      `Preserving client cache_control (client=${userAgent?.substring(0, 20)}, combo=${isCombo}, strategy=${comboStrategy}, provider=${provider})`
+    );
+  }
+
  try {
    if (nativeCodexPassthrough) {
      translatedBody = { ...body, _nativeCodexPassthrough: true };
@@ -701,7 +730,7 @@ export async function handleChatCore({
        credentials,
        provider,
        reqLogger,
-        { normalizeToolCallId, preserveDeveloperRole }
+        { normalizeToolCallId, preserveDeveloperRole, preserveCacheControl }
      );
      translatedBody = translateRequest(
        FORMATS.OPENAI,
@@ -712,7 +741,7 @@ export async function handleChatCore({
        credentials,
        provider,
        reqLogger,
-        { normalizeToolCallId, preserveDeveloperRole }
+        { normalizeToolCallId, preserveDeveloperRole, preserveCacheControl }
      );
      log?.debug?.("FORMAT", "claude->openai->claude normalized passthrough");
    } else {
@@ -816,7 +845,7 @@ export async function handleChatCore({
        credentials,
        provider,
        reqLogger,
-        { normalizeToolCallId, preserveDeveloperRole }
+        { normalizeToolCallId, preserveDeveloperRole, preserveCacheControl }
      );
    }
  } catch (error) {
@@ -1406,6 +1435,18 @@ export async function handleChatCore({
      const msg = `[${new Date().toLocaleTimeString("en-US", { hour12: false, hour: "2-digit", minute: "2-digit" })}] 📊 [USAGE] ${provider.toUpperCase()} | in=${getLoggedInputTokens(usage)} | out=${getLoggedOutputTokens(usage)}${connectionId ? ` | account=${connectionId.slice(0, 8)}...` : ""}`;
      console.log(`${COLORS.green}${msg}${COLORS.reset}`);

+      // Track cache token metrics
+      const inputTokens = usage.prompt_tokens || 0;
+      const cachedTokens = toPositiveNumber(
+        usage.cache_read_input_tokens ??
+          usage.cached_tokens ??
+          (usage as any).prompt_tokens_details?.cached_tokens
+      );
+      const cacheCreationTokens = toPositiveNumber(
+        usage.cache_creation_input_tokens ??
+          (usage as any).prompt_tokens_details?.cache_creation_tokens
+      );
+
      saveRequestUsage({
        provider: provider || "unknown",
        model: model || "unknown",
@@ -1549,8 +1590,41 @@ export async function handleChatCore({
    responseBody: streamResponseBody,
    providerPayload,
    clientPayload,
+    ttft,
  }) => {
    const cacheUsageLogMeta = buildCacheUsageLogMeta(streamUsage);
+
+    // Track cache token metrics for streaming responses
+    if (streamUsage && typeof streamUsage === "object") {
+      const inputTokens = streamUsage.prompt_tokens || 0;
+      const cachedTokens = toPositiveNumber(
+        streamUsage.cache_read_input_tokens ??
+          streamUsage.cached_tokens ??
+          (streamUsage as any).prompt_tokens_details?.cached_tokens
+      );
+      const cacheCreationTokens = toPositiveNumber(
+        streamUsage.cache_creation_input_tokens ??
+          (streamUsage as any).prompt_tokens_details?.cache_creation_tokens
+      );
+
+      saveRequestUsage({
+        provider: provider || "unknown",
+        model: model || "unknown",
+        tokens: streamUsage,
+        status: String(streamStatus || 200),
+        success: streamStatus === 200,
+        latencyMs: Date.now() - startTime,
+        timeToFirstTokenMs: ttft,
+        errorCode: null,
+        timestamp: new Date().toISOString(),
+        connectionId: connectionId || undefined,
+        apiKeyId: apiKeyInfo?.id || undefined,
+        apiKeyName: apiKeyInfo?.name || undefined,
+      }).catch((err) => {
+        console.error("Failed to save usage stats:", err.message);
+      });
+    }
+
    persistAttemptLogs({
      status: streamStatus || 200,
      tokens: streamUsage || {},
@@ -80,16 +80,24 @@ export async function handleEmbedding({
    };
  }

-  // Build upstream request
+  // Build upstream request — start with standard fields, then forward any extras
+  // the client sent (e.g. input_type, user, truncate for NVIDIA NIM asymmetric models).
+  const KNOWN_FIELDS = new Set(["model", "input", "dimensions", "encoding_format"]);
+
  const upstreamBody: Record<string, unknown> = {
    model: model,
    input: body.input,
  };

-  // Pass optional parameters
  if (body.dimensions !== undefined) upstreamBody.dimensions = body.dimensions;
  if (body.encoding_format !== undefined) upstreamBody.encoding_format = body.encoding_format;

+  for (const [key, value] of Object.entries(body)) {
+    if (!KNOWN_FIELDS.has(key) && value !== undefined) {
+      upstreamBody[key] = value;
+    }
+  }
+
  // Build headers
  const headers = {
    "Content-Type": "application/json",
@@ -104,6 +112,12 @@ export async function handleEmbedding({
    } else if (providerConfig.authHeader === "x-api-key") {
      headers["x-api-key"] = token;
    }
+  } else if (providerConfig.authType !== "none") {
+    return {
+      success: false,
+      status: 401,
+      error: `No valid authentication token for provider ${provider}. Check provider credentials.`,
+    };
  }

  if (log) {
@@ -52,6 +52,10 @@ export function parseSSEToOpenAIResponse(rawSSE, fallbackModel) {
    if (typeof delta.reasoning_content === "string" && delta.reasoning_content.length > 0) {
      reasoningParts.push(delta.reasoning_content);
    }
+    // Normalize `reasoning` alias (NVIDIA kimi-k2.5 etc.)
+    if (typeof delta.reasoning === "string" && delta.reasoning.length > 0 && !delta.reasoning_content) {
+      reasoningParts.push(delta.reasoning);
+    }

    // T18: Accumulate tool calls correctly across streamed chunks
    if (delta.tool_calls) {
@@ -94,12 +98,14 @@ export function parseSSEToOpenAIResponse(rawSSE, fallbackModel) {
    }
  }

+  const joinedContent = contentParts.length > 0 ? contentParts.join("").trim() : null;
+  const joinedReasoning = reasoningParts.length > 0 ? reasoningParts.join("").trim() : null;
  const message: Record<string, unknown> = {
    role: "assistant",
-    content: contentParts.length > 0 ? contentParts.join("") : null,
+    content: joinedContent || null,
  };
-  if (reasoningParts.length > 0) {
-    message.reasoning_content = reasoningParts.join("");
+  if (joinedReasoning) {
+    message.reasoning_content = joinedReasoning;
  }

  const finalToolCalls = [...accumulatedToolCalls.values()].filter(Boolean).sort((a, b) => {
@@ -137,7 +137,7 @@ omniroute --mcp
 | 9   | `omniroute_simulate_route`         | `read:health`, `read:combos`         | Dry-run routing simulation showing fallback tree and estimated costs         |
 | 10  | `omniroute_set_budget_guard`       | `write:budget`                       | Set session budget with action on exceed: `degrade`, `block`, or `alert`     |
 | 11  | `omniroute_set_resilience_profile` | `write:resilience`                   | Apply resilience profile: `aggressive`, `balanced`, or `conservative`        |
-| 12  | `omniroute_test_combo`             | `execute:completions`, `read:combos` | Test each provider in a combo with a real prompt, report latency/cost        |
+| 12  | `omniroute_test_combo`             | `execute:completions`, `read:combos` | Test each provider in a combo with a real prompt and a real upstream call, report latency/cost |
 | 13  | `omniroute_get_provider_metrics`   | `read:health`                        | Per-provider metrics with latency percentiles (p50/p95/p99), circuit breaker |
 | 14  | `omniroute_best_combo_for_task`    | `read:combos`, `read:health`         | AI-powered combo recommendation by task type with budget/latency constraints |
 | 15  | `omniroute_explain_route`          | `read:health`, `read:usage`          | Explain why a request was routed to a provider (scoring factors, fallbacks)  |
@@ -17,6 +17,10 @@ export const ACCOUNT_DEACTIVATED_SIGNALS = [
  "account has been disabled",
  "your account has been suspended",
  "this account is deactivated",
+  // AG (Antigravity/Google Cloud Code) permanent ban signals
+  "verify your account to continue",
+  "this service has been disabled in this account for violation",
+  "this service has been disabled in this account",
 ];

 // T10 (sub2api PR #1169): Signals that indicate billing credits are exhausted.
@@ -45,6 +45,80 @@ const DEFAULT_MODEL_P95_MS = {
 };
 const MIN_HISTORY_SAMPLES = 10;

+/**
+ * Validate that a successful (HTTP 200) non-streaming response actually contains
+ * meaningful content. Returns { valid: true } or { valid: false, reason }.
+ *
+ * Only inspects non-streaming JSON responses — streaming responses are passed through
+ * because buffering the full stream would defeat the purpose of streaming.
+ *
+ * Checks:
+ * 1. Body is valid JSON
+ * 2. Has at least one choice with non-empty content or tool_calls
+ */
+async function validateResponseQuality(
+  response: Response,
+  isStreaming: boolean,
+  log: { warn?: (...args: any[]) => void }
+): Promise<{ valid: boolean; reason?: string; clonedResponse?: Response }> {
+  if (isStreaming) return { valid: true };
+
+  const contentType = response.headers.get("content-type") || "";
+  if (!contentType.includes("application/json") && !contentType.includes("text/")) {
+    return { valid: true };
+  }
+
+  let cloned: Response;
+  try {
+    cloned = response.clone();
+  } catch {
+    return { valid: true };
+  }
+
+  let text: string;
+  try {
+    text = await cloned.text();
+  } catch {
+    return { valid: true };
+  }
+
+  if (!text || text.trim().length === 0) {
+    return { valid: false, reason: "empty response body" };
+  }
+
+  let json: any;
+  try {
+    json = JSON.parse(text);
+  } catch {
+    if (text.startsWith("data:")) return { valid: true };
+    return { valid: false, reason: "response is not valid JSON" };
+  }
+
+  const choices = json?.choices;
+  if (!Array.isArray(choices) || choices.length === 0) {
+    if (json?.output || json?.result || json?.data || json?.response) return { valid: true };
+    if (json?.error) return { valid: false, reason: `upstream error in 200 body: ${json.error?.message || JSON.stringify(json.error).substring(0, 200)}` };
+    return { valid: true };
+  }
+
+  const firstChoice = choices[0];
+  const message = firstChoice?.message || firstChoice?.delta;
+  if (!message) {
+    return { valid: false, reason: "choice has no message object" };
+  }
+
+  const content = message.content;
+  const toolCalls = message.tool_calls;
+  const hasContent = content !== null && content !== undefined && content !== "";
+  const hasToolCalls = Array.isArray(toolCalls) && toolCalls.length > 0;
+
+  if (!hasContent && !hasToolCalls) {
+    return { valid: false, reason: "empty content and no tool_calls in response" };
+  }
+
+  return { valid: true };
+}
+
 // In-memory atomic counter per combo for round-robin distribution
 // Resets on server restart (by design — no stale state)
 const rrCounters = new Map();
@@ -872,14 +946,31 @@ export async function handleComboChat({

      const result = await handleSingleModelWrapped(body, modelStr);

-      // Success — return response
+      // Success — validate response quality before returning
      if (result.ok) {
+        const quality = await validateResponseQuality(result, !!body.stream, log);
+        if (!quality.valid) {
+          log.warn(
+            "COMBO",
+            `Model ${modelStr} returned 200 but failed quality check: ${quality.reason}`
+          );
+          breaker._onFailure();
+          recordComboRequest(combo.name, modelStr, {
+            success: false,
+            latencyMs: Date.now() - startTime,
+            fallbackCount,
+            strategy,
+          });
+          if (i > 0) fallbackCount++;
+          break; // move to next model
+        }
        resolvedByModel = modelStr;
        const latencyMs = Date.now() - startTime;
        log.info(
          "COMBO",
          `Model ${modelStr} succeeded (${latencyMs}ms, ${fallbackCount} fallbacks)`
        );
+        breaker._onSuccess();
        recordComboRequest(combo.name, modelStr, {
          success: true,
          latencyMs,
@@ -1139,13 +1230,30 @@ async function handleRoundRobinCombo({

        const result = await handleSingleModel(body, modelStr);

-        // Success
+        // Success — validate response quality before returning
        if (result.ok) {
+          const quality = await validateResponseQuality(result, !!body.stream, log);
+          if (!quality.valid) {
+            log.warn(
+              "COMBO-RR",
+              `${modelStr} returned 200 but failed quality check: ${quality.reason}`
+            );
+            breaker._onFailure();
+            recordComboRequest(combo.name, modelStr, {
+              success: false,
+              latencyMs: Date.now() - startTime,
+              fallbackCount,
+              strategy: "round-robin",
+            });
+            if (offset > 0) fallbackCount++;
+            break; // move to next model
+          }
          const latencyMs = Date.now() - startTime;
          log.info(
            "COMBO-RR",
            `${modelStr} succeeded (${latencyMs}ms, ${fallbackCount} fallbacks)`
          );
+          breaker._onSuccess();
          recordComboRequest(combo.name, modelStr, {
            success: true,
            latencyMs,
@@ -48,3 +48,54 @@ export function supportsToolCalling(modelStr: string): boolean {

  return !blocked;
 }
+
+// Models that do NOT support reasoning/thinking parameters.
+// AG (Antigravity) claude-sonnet-4-6 routes through a Google internal API
+// that returns 400 if thinking params are included.
+const REASONING_UNSUPPORTED_PATTERNS = [
+  "antigravity/claude-sonnet-4-6",
+  "antigravity/claude-sonnet-4-5",
+  "antigravity/claude-sonnet-4",
+  "ag/claude-sonnet-4-6",
+  "ag/claude-sonnet-4-5",
+  "ag/claude-sonnet-4",
+];
+
+function getRegistryReasoningFlag(providerIdOrAlias: string, modelId: string): boolean | null {
+  const providerAlias = PROVIDER_ID_TO_ALIAS[providerIdOrAlias] || providerIdOrAlias;
+  const models = PROVIDER_MODELS[providerAlias];
+  if (!Array.isArray(models)) return null;
+  const found = models.find((m) => m?.id === modelId);
+  if (!found) return null;
+  return typeof found.supportsReasoning === "boolean" ? found.supportsReasoning : null;
+}
+
+/**
+ * Returns whether a model supports reasoning/thinking parameters.
+ *
+ * Decision order:
+ * 1) Provider registry metadata (supportsReasoning flag) when available.
+ * 2) Explicit denylist for known unsupported models (e.g. AG Claude Sonnet).
+ * 3) Default true (pass through — safe, provider will ignore if unsupported).
+ */
+export function supportsReasoning(modelStr: string): boolean {
+  const parsed = parseModel(modelStr);
+  const provider = parsed.provider || parsed.providerAlias || "";
+  const model = parsed.model || modelStr;
+
+  if (provider) {
+    const fromRegistry = getRegistryReasoningFlag(provider, model);
+    if (fromRegistry !== null) return fromRegistry;
+  }
+
+  const normalized = String(modelStr || "").toLowerCase();
+  if (!normalized) return true;
+
+  const blocked = REASONING_UNSUPPORTED_PATTERNS.some((pattern) =>
+    normalized === pattern ||
+    normalized.endsWith(`/${pattern}`) ||
+    normalized.includes(pattern)
+  );
+
+  return !blocked;
+}
@@ -14,6 +14,7 @@ export const ThinkingMode = {
 };

 import { capThinkingBudget, getDefaultThinkingBudget } from "@/shared/constants/modelSpecs";
+import { supportsReasoning } from "./modelCapabilities.ts";

 // Effort → budget token mapping
 export const EFFORT_BUDGETS = {
@@ -151,6 +152,13 @@ export function applyThinkingBudget(body, config = null) {
  const cfg = config || _config;
  if (!body || typeof body !== "object") return body;

+  // Early exit: strip ALL reasoning/thinking params for models that don't support them.
+  // Sending thinking params to unsupported models (e.g. AG claude-sonnet-4-6) causes 400 errors.
+  const modelStr = typeof body.model === "string" ? body.model : "";
+  if (modelStr && !supportsReasoning(modelStr)) {
+    return stripThinkingConfig(body);
+  }
+
  // Pre-processing: convert string thinkingLevel to numeric budget
  let processed = normalizeThinkingLevel(body);

@@ -98,6 +98,7 @@ export function createResponsesApiTransformStream(logger = null) {
    funcItemDone: {},
    buffer: "",
    completedSent: false,
+    usage: null,
  };

  const encoder = new TextEncoder();
@@ -249,16 +250,52 @@ export function createResponsesApiTransformStream(logger = null) {
  const sendCompleted = (controller) => {
    if (!state.completedSent) {
      state.completedSent = true;
+
+      // Build output from accumulated state
+      const output = [];
+      if (state.reasoningId) {
+        output.push({
+          id: state.reasoningId,
+          type: "reasoning",
+          summary: [{ type: "summary_text", text: state.reasoningBuf }],
+        });
+      }
+      for (const idx in state.msgItemAdded) {
+        output.push({
+          id: `msg_${state.responseId}_${idx}`,
+          type: "message",
+          role: "assistant",
+          content: [{ type: "output_text", annotations: [], text: state.msgTextBuf[idx] || "" }],
+        });
+      }
+      for (const idx in state.funcCallIds) {
+        const callId = state.funcCallIds[idx];
+        output.push({
+          id: `fc_${callId}`,
+          type: "function_call",
+          call_id: callId,
+          name: state.funcNames[idx] || "",
+          arguments: state.funcArgsBuf[idx] || "{}",
+        });
+      }
+
+      const response: Record<string, unknown> = {
+        id: state.responseId,
+        object: "response",
+        created_at: state.created,
+        status: "completed",
+        background: false,
+        error: null,
+        output,
+      };
+
+      if (state.usage) {
+        response.usage = state.usage;
+      }
+
      emit(controller, "response.completed", {
        type: "response.completed",
-        response: {
-          id: state.responseId,
-          object: "response",
-          created_at: state.created,
-          status: "completed",
-          background: false,
-          error: null,
-        },
+        response,
      });
    }
  };
@@ -288,7 +325,12 @@ export function createResponsesApiTransformStream(logger = null) {
          continue;
        }

-        if (!parsed.choices?.length) continue;
+        if (!parsed.choices?.length) {
+          if (parsed.usage) {
+            state.usage = parsed.usage;
+          }
+          continue;
+        }

        const choice = parsed.choices[0];
        const idx = choice.index || 0;
@@ -335,7 +377,7 @@ export function createResponsesApiTransformStream(logger = null) {

          if (content.includes("<think>")) {
            state.inThinking = true;
-            content = content.replace("<think>", "");
+            content = content.replaceAll("<think>", "");
            startReasoning(controller, idx);
          }

@@ -167,13 +167,19 @@ function convertConstToEnum(obj) {
 }

 // Convert enum values to strings (Gemini requires string enum values)
+// For integer types, remove enum entirely as Gemini doesn't support it
 function convertEnumValuesToStrings(obj) {
  if (!obj || typeof obj !== "object") return;

  if (obj.enum && Array.isArray(obj.enum)) {
-    obj.enum = obj.enum.map((v) => String(v));
-    if (!obj.type) {
-      obj.type = "string";
+    // Gemini only supports enum for string types, not integer
+    if (obj.type === "integer" || obj.type === "number") {
+      delete obj.enum;
+    } else {
+      obj.enum = obj.enum.map((v) => String(v));
+      if (!obj.type) {
+        obj.type = "string";
+      }
    }
  }

@@ -1,105 +1,9 @@
 /**
- * Convert OpenAI Responses API format to standard chat completions format
- * Responses API uses: { input: [...], instructions: "..." }
- * Chat API uses: { messages: [...] }
+ * Convert OpenAI Responses API format to standard chat completions format.
+ * Delegates to the canonical translator to avoid logic duplication.
 */
+import { openaiResponsesToOpenAIRequest } from "../request/openai-responses.ts";
+
 export function convertResponsesApiFormat(body) {
-  if (!body.input) return body;
-
-  const result = { ...body };
-  result.messages = [];
-
-  // Convert instructions to system message
-  if (body.instructions) {
-    result.messages.push({ role: "system", content: body.instructions });
-  }
-
-  // Group items by conversation turn
-  let currentAssistantMsg = null;
-  let pendingToolCalls = [];
-  let pendingToolResults = [];
-
-  for (const item of body.input) {
-    // Determine item type - Droid CLI sends role-based items without 'type' field
-    // Fallback: if no type but has role property, treat as message
-    const itemType = item.type || (item.role ? "message" : null);
-
-    if (itemType === "message") {
-      // Flush each pending assistant message with tool calls
-      if (currentAssistantMsg) {
-        result.messages.push(currentAssistantMsg);
-        currentAssistantMsg = null;
-      }
-      // Flush pending tool results
-      if (pendingToolResults.length > 0) {
-        for (const tr of pendingToolResults) {
-          result.messages.push(tr);
-        }
-        pendingToolResults = [];
-      }
-
-      // Convert content: input_text → text, output_text → text
-      const content = Array.isArray(item.content)
-        ? item.content.map((c) => {
-            if (c.type === "input_text") return { type: "text", text: c.text };
-            if (c.type === "output_text") return { type: "text", text: c.text };
-            return c;
-          })
-        : item.content;
-      result.messages.push({ role: item.role, content });
-    } else if (itemType === "function_call") {
-      // Start or append to assistant message with tool_calls
-      if (!currentAssistantMsg) {
-        currentAssistantMsg = {
-          role: "assistant",
-          content: null,
-          tool_calls: [],
-        };
-      }
-      currentAssistantMsg.tool_calls.push({
-        id: item.call_id,
-        type: "function",
-        function: {
-          name: item.name,
-          arguments: item.arguments,
-        },
-      });
-    } else if (itemType === "function_call_output") {
-      // Flush assistant message first if exists
-      if (currentAssistantMsg) {
-        result.messages.push(currentAssistantMsg);
-        currentAssistantMsg = null;
-      }
-      // Add tool result
-      pendingToolResults.push({
-        role: "tool",
-        tool_call_id: item.call_id,
-        content: typeof item.output === "string" ? item.output : JSON.stringify(item.output),
-      });
-    } else if (itemType === "reasoning") {
-      // Skip reasoning items - they are for display only
-      continue;
-    }
-  }
-
-  // Flush remaining
-  if (currentAssistantMsg) {
-    result.messages.push(currentAssistantMsg);
-  }
-  if (pendingToolResults.length > 0) {
-    for (const tr of pendingToolResults) {
-      result.messages.push(tr);
-    }
-  }
-
-  // Cleanup Responses API specific fields
-  // Note: prompt_cache_key is intentionally preserved — it is used by Codex and other
-  // providers as a cache-affinity signal. Stripping it breaks prompt caching (#517).
-  delete result.input;
-  delete result.instructions;
-  delete result.include;
-  delete result.store;
-  delete result.reasoning;
-
-  return result;
+  return openaiResponsesToOpenAIRequest(null, body, null, null);
 }
@@ -73,6 +73,7 @@ function normalizeOpenAIResponsesRequest(body) {

 /** @param options.normalizeToolCallId - When true, use 9-char tool call ids (e.g. Mistral); when false, leave ids as-is */
 /** @param options.preserveDeveloperRole - undefined/true: keep developer for OpenAI format (default); false: map to system */
+/** @param options.preserveCacheControl - When true, preserve client-side cache_control markers (for Claude Code, etc.) */
 // Translate request: source -> openai -> target
 export function translateRequest(
  sourceFormat,
@@ -83,7 +84,7 @@ export function translateRequest(
  credentials = null,
  provider = null,
  reqLogger = null,
-  options?: { normalizeToolCallId?: boolean; preserveDeveloperRole?: boolean }
+  options?: { normalizeToolCallId?: boolean; preserveDeveloperRole?: boolean; preserveCacheControl?: boolean }
 ) {
  let result = body;
  const use9CharId = options?.normalizeToolCallId === true;
@@ -149,10 +150,13 @@ export function translateRequest(
  }

  // Final step: prepare request for Claude format endpoints
-  // In Claude passthrough mode (Claude → Claude), preserve cache_control markers
+  // Preserve cache_control when:
+  // 1. Claude passthrough mode (Claude → Claude), OR
+  // 2. Explicitly requested via options (for caching-aware clients like Claude Code)
  if (targetFormat === FORMATS.CLAUDE) {
    const isClaudePassthrough = sourceFormat === FORMATS.CLAUDE;
-    result = prepareClaudeRequest(result, provider, isClaudePassthrough);
+    const preserveCache = isClaudePassthrough || options?.preserveCacheControl === true;
+    result = prepareClaudeRequest(result, provider, preserveCache);
  }

  // Normalize openai-responses input shape for providers that require list input.
@@ -10,8 +10,6 @@ import { generateToolCallId } from "../helpers/toolCallHelper.ts";

 type JsonRecord = Record<string, unknown>;

-const UNSUPPORTED_TOOLS = ["file_search", "code_interpreter", "web_search_preview"];
-
 function toRecord(value: unknown): JsonRecord {
  return value && typeof value === "object" && !Array.isArray(value) ? (value as JsonRecord) : {};
 }
@@ -47,14 +45,16 @@ export function openaiResponsesToOpenAIRequest(
  const root = toRecord(body);
  if (root.input === undefined) return body;

-  // Validate unsupported features - return clear errors instead of silent failure
+  // Validate tool types — only function tools can be translated to Chat Completions
  const tools = toArray(root.tools);
  if (tools.length > 0) {
    for (const toolValue of tools) {
      const tool = toRecord(toolValue);
-      if (UNSUPPORTED_TOOLS.includes(toString(tool.type))) {
+      const toolType = toString(tool.type);
+      // Allow: function tools, and tools already in Chat format (have .function property)
+      if (toolType && toolType !== "function" && !tool.function) {
        throw unsupportedFeature(
-          `Unsupported Responses API feature: ${toString(tool.type)} tool type is not supported by omniroute`
+          `Unsupported Responses API feature: ${toolType} tool type is not supported by omniroute`
        );
      }
    }
@@ -112,6 +112,24 @@ export function openaiResponsesToOpenAIRequest(
            if (contentItem.type === "output_text") {
              return { type: "text", text: toString(contentItem.text) };
            }
+            if (contentItem.type === "input_image") {
+              const imgResult: JsonRecord = {
+                type: "image_url",
+                image_url: { url: toString(contentItem.image_url) },
+              };
+              if (contentItem.detail !== undefined) {
+                (imgResult.image_url as JsonRecord).detail = contentItem.detail;
+              }
+              return imgResult;
+            }
+            if (contentItem.type === "input_file") {
+              const fileObj: JsonRecord = {};
+              if (contentItem.file_data !== undefined) fileObj.file_data = contentItem.file_data;
+              if (contentItem.file_id !== undefined) fileObj.file_id = contentItem.file_id;
+              if (contentItem.file_url !== undefined) fileObj.file_url = contentItem.file_url;
+              if (contentItem.filename !== undefined) fileObj.filename = contentItem.filename;
+              return { type: "file", file: fileObj };
+            }
            return contentValue;
          })
        : item.content;
@@ -144,7 +162,9 @@ export function openaiResponsesToOpenAIRequest(
        type: "function",
        function: {
          name: fnName,
-          arguments: item.arguments,
+          arguments: typeof item.arguments === "string"
+            ? item.arguments
+            : JSON.stringify(item.arguments ?? {}),
        },
      });
      currentAssistantMsg.tool_calls = toolCalls;
@@ -226,6 +246,20 @@ export function openaiResponsesToOpenAIRequest(
    return true;
  });

+  // Translate tool_choice object format: Responses {type,name} → Chat {type,function:{name}}
+  if (result.tool_choice && typeof result.tool_choice === "object" && !Array.isArray(result.tool_choice)) {
+    const tc = toRecord(result.tool_choice);
+    const tcType = toString(tc.type);
+    if (tcType === "function" && tc.name !== undefined && !tc.function) {
+      result.tool_choice = { type: "function", function: { name: tc.name } };
+    } else if (tcType && tcType !== "function" && tcType !== "allowed_tools") {
+      // Built-in tool types (web_search_preview, file_search, etc.) have no Chat equivalent
+      throw unsupportedFeature(
+        `Unsupported Responses API feature: tool_choice type '${tcType}' is not supported by omniroute`
+      );
+    }
+  }
+
  // Cleanup Responses API specific fields
  // Note: prompt_cache_key is intentionally preserved — it is used by Codex and other
  // providers as a cache-affinity signal. Stripping it breaks prompt caching (#517).
@@ -288,11 +322,24 @@ export function openaiToOpenAIResponsesRequest(
                  return { type: "input_text", text: toString(contentItem.text) };
                }
                if (contentItem.type === "image_url") {
-                  const imgUrl = contentItem.image_url as string | { url?: string };
-                  return {
+                  const imgUrl = contentItem.image_url as string | { url?: string; detail?: string };
+                  const imgResult: JsonRecord = {
                    type: "input_image",
                    image_url: typeof imgUrl === "string" ? imgUrl : imgUrl?.url || "",
                  };
+                  if (typeof imgUrl === "object" && imgUrl?.detail !== undefined) {
+                    imgResult.detail = imgUrl.detail;
+                  }
+                  return imgResult;
+                }
+                if (contentItem.type === "file") {
+                  const file = toRecord(contentItem.file);
+                  const fileResult: JsonRecord = { type: "input_file" };
+                  if (file.file_data !== undefined) fileResult.file_data = file.file_data;
+                  if (file.file_id !== undefined) fileResult.file_id = file.file_id;
+                  if (file.file_url !== undefined) fileResult.file_url = file.file_url;
+                  if (file.filename !== undefined) fileResult.filename = file.filename;
+                  return fileResult;
                }
                return contentValue;
              })
@@ -358,6 +405,20 @@ export function openaiToOpenAIResponsesRequest(
          });
        }
      }
+
+      // Handle deprecated function_call field (pre-tool_calls API)
+      if (msg.function_call && !msg.tool_calls) {
+        const fc = toRecord(msg.function_call);
+        const fnName = toString(fc.name).trim();
+        if (fnName) {
+          input.push({
+            type: "function_call",
+            call_id: `call_${fnName}`,
+            name: fnName,
+            arguments: toString(fc.arguments, "{}"),
+          });
+        }
+      }
    }

    // Convert tool results
@@ -365,7 +426,24 @@ export function openaiToOpenAIResponsesRequest(
      input.push({
        type: "function_call_output",
        call_id: toString(msg.tool_call_id),
-        output: msg.content,
+        output: typeof msg.content === "string"
+          ? msg.content
+          : Array.isArray(msg.content)
+            ? msg.content.map((c) => {
+                const part = toRecord(c);
+                if (part.type === "text") return { type: "input_text", text: toString(part.text) };
+                return c;
+              })
+            : String(msg.content ?? ""),
+      });
+    }
+
+    // Handle deprecated function role messages
+    if (role === "function") {
+      input.push({
+        type: "function_call_output",
+        call_id: `call_${toString(msg.name)}`,
+        output: typeof msg.content === "string" ? msg.content : String(msg.content ?? ""),
      });
    }
  }
@@ -409,6 +487,23 @@ export function openaiToOpenAIResponsesRequest(
    });
  }

+  // Translate tool_choice: Chat {type,function:{name}} → Responses {type,name}
+  if (root.tool_choice !== undefined) {
+    if (typeof root.tool_choice === "string") {
+      result.tool_choice = root.tool_choice;
+    } else if (typeof root.tool_choice === "object" && !Array.isArray(root.tool_choice)) {
+      const tc = toRecord(root.tool_choice);
+      if (tc.type === "function" && tc.function) {
+        const fn = toRecord(tc.function);
+        result.tool_choice = { type: "function", name: fn.name };
+      } else {
+        result.tool_choice = root.tool_choice;
+      }
+    } else {
+      result.tool_choice = root.tool_choice;
+    }
+  }
+
  // Pass through relevant fields
  if (root.service_tier !== undefined) result.service_tier = root.service_tier;
  if (root.temperature !== undefined) result.temperature = root.temperature;
@@ -14,7 +14,13 @@ export function openaiToOpenAIResponsesResponse(chunk, state) {
    return flushEvents(state);
  }

-  if (!chunk.choices?.length) return [];
+  if (!chunk.choices?.length) {
+    // Capture usage from usage-only chunks (stream_options.include_usage)
+    if (chunk.usage) {
+      state.usage = chunk.usage;
+    }
+    return [];
+  }

  const events = [];
  const nextSeq = () => ++state.seq;
@@ -69,7 +75,7 @@ export function openaiToOpenAIResponsesResponse(chunk, state) {

    if (content.includes("<think>")) {
      state.inThinking = true;
-      content = content.replace("<think>", "");
+      content = content.replaceAll("<think>", "");
      startReasoning(state, emit, idx);
    }

@@ -334,16 +340,52 @@ function closeToolCall(state, emit, idx) {
 function sendCompleted(state, emit) {
  if (!state.completedSent) {
    state.completedSent = true;
+
+    // Build output from accumulated state
+    const output = [];
+    if (state.reasoningId) {
+      output.push({
+        id: state.reasoningId,
+        type: "reasoning",
+        summary: [{ type: "summary_text", text: state.reasoningBuf }],
+      });
+    }
+    for (const idx in state.msgItemAdded) {
+      output.push({
+        id: `msg_${state.responseId}_${idx}`,
+        type: "message",
+        role: "assistant",
+        content: [{ type: "output_text", annotations: [], text: state.msgTextBuf[idx] || "" }],
+      });
+    }
+    for (const idx in state.funcCallIds) {
+      const callId = state.funcCallIds[idx];
+      output.push({
+        id: `fc_${callId}`,
+        type: "function_call",
+        call_id: callId,
+        name: state.funcNames[idx] || "",
+        arguments: state.funcArgsBuf[idx] || "{}",
+      });
+    }
+
+    const response: Record<string, unknown> = {
+      id: state.responseId,
+      object: "response",
+      created_at: state.created,
+      status: "completed",
+      background: false,
+      error: null,
+      output,
+    };
+
+    if (state.usage) {
+      response.usage = state.usage;
+    }
+
    emit("response.completed", {
      type: "response.completed",
-      response: {
-        id: state.responseId,
-        object: "response",
-        created_at: state.created,
-        status: "completed",
-        background: false,
-        error: null,
-      },
+      response,
    });
  }
 }
@@ -560,10 +602,21 @@ export function openaiResponsesToOpenAIResponse(chunk, state) {
    return null;
  }

-  // Reasoning events (convert to content or skip)
+  // Reasoning events — emit as reasoning_content in Chat format
  if (eventType === "response.reasoning_summary_text.delta") {
-    // Optionally include reasoning as content, or skip
-    return null;
+    const reasoningDelta = data.delta || "";
+    if (!reasoningDelta) return null;
+    return {
+      id: state.chatId,
+      object: "chat.completion.chunk",
+      created: state.created,
+      model: state.model || "gpt-4",
+      choices: [{
+        index: 0,
+        delta: { reasoning_content: reasoningDelta },
+        finish_reason: null,
+      }],
+    };
  }

  // Ignore other events
@@ -0,0 +1,305 @@
+/**
+ * Cache Control Policy
+ *
+ * Determines when to preserve client-side prompt caching headers (cache_control)
+ * vs. applying OmniRoute's own caching strategy.
+ *
+ * Client-side caching (e.g., Claude Code) should be preserved when:
+ * 1. Client is Claude Code or similar caching-aware client
+ * 2. Request will hit a deterministic target (single model or deterministic combo strategy)
+ * 3. Provider supports prompt caching (Anthropic, Alibaba Qwen, etc.)
+ */
+
+import type { RoutingStrategyValue } from "../../src/shared/constants/routingStrategies";
+
+/**
+ * Cache control preservation modes
+ */
+export type CacheControlMode = "auto" | "always" | "never";
+
+/**
+ * Cache control settings from the database
+ */
+export interface CacheControlSettings {
+  alwaysPreserveClientCache?: CacheControlMode;
+}
+
+/**
+ * Cache metrics for tracking effectiveness
+ */
+export interface CacheControlMetrics {
+  // Totals
+  totalRequests: number;
+  requestsWithCacheControl: number;
+
+  // Token counts
+  totalInputTokens: number;
+  totalCachedTokens: number;
+  totalCacheCreationTokens: number;
+
+  // Savings
+  tokensSaved: number;
+  estimatedCostSaved: number;
+
+  // Breakdowns
+  byProvider: Record<
+    string,
+    {
+      requests: number;
+      inputTokens: number;
+      cachedTokens: number;
+      cacheCreationTokens: number;
+    }
+  >;
+  byStrategy: Record<
+    string,
+    {
+      requests: number;
+      inputTokens: number;
+      cachedTokens: number;
+      cacheCreationTokens: number;
+    }
+  >;
+
+  lastUpdated: string;
+}
+
+/**
+ * Routing strategies that are deterministic (same request → same provider)
+ */
+const DETERMINISTIC_STRATEGIES: Set<RoutingStrategyValue> = new Set(["priority", "cost-optimized"]);
+
+/**
+ * Providers that support prompt caching
+ */
+const CACHING_PROVIDERS = new Set([
+  "claude",
+  "anthropic",
+  "zai",
+  "qwen", // Alibaba Qwen Coding Plan International
+]);
+
+/**
+ * Detect if the client is Claude Code or another caching-aware client
+ */
+export function isClaudeCodeClient(userAgent: string | null | undefined): boolean {
+  if (!userAgent) return false;
+  const ua = userAgent.toLowerCase();
+
+  // Claude Code user agents
+  if (ua.includes("claude-code") || ua.includes("claude_code")) return true;
+  if (ua.includes("anthropic") && ua.includes("cli")) return true;
+
+  return false;
+}
+
+/**
+ * Check if a provider supports prompt caching
+ */
+export function providerSupportsCaching(provider: string | null | undefined): boolean {
+  if (!provider) return false;
+  return CACHING_PROVIDERS.has(provider.toLowerCase());
+}
+
+/**
+ * Check if a routing strategy is deterministic
+ */
+export function isDeterministicStrategy(
+  strategy: RoutingStrategyValue | null | undefined
+): boolean {
+  if (!strategy) return false;
+  return DETERMINISTIC_STRATEGIES.has(strategy);
+}
+
+/**
+ * Determine if client-side cache_control headers should be preserved
+ *
+ * @param userAgent - User-Agent header from the request
+ * @param isCombo - Whether this is a combo model
+ * @param comboStrategy - The combo's routing strategy (if applicable)
+ * @param targetProvider - The target provider for the request
+ * @param settings - Cache control settings from database (optional)
+ * @returns true if cache_control should be preserved, false if OmniRoute should manage it
+ */
+export function shouldPreserveCacheControl({
+  userAgent,
+  isCombo,
+  comboStrategy,
+  targetProvider,
+  settings,
+}: {
+  userAgent: string | null | undefined;
+  isCombo: boolean;
+  comboStrategy?: RoutingStrategyValue | null;
+  targetProvider: string | null | undefined;
+  settings?: CacheControlSettings;
+}): boolean {
+  // User override takes precedence
+  if (settings?.alwaysPreserveClientCache === "always") {
+    return true;
+  }
+  if (settings?.alwaysPreserveClientCache === "never") {
+    return false;
+  }
+
+  // Auto mode: use automatic detection (existing logic)
+  // Must be a caching-aware client
+  if (!isClaudeCodeClient(userAgent)) {
+    return false;
+  }
+
+  // Target provider must support caching
+  if (!providerSupportsCaching(targetProvider)) {
+    return false;
+  }
+
+  // Single model: always preserve (deterministic)
+  if (!isCombo) {
+    return true;
+  }
+
+  // Combo: only preserve if strategy is deterministic
+  return isDeterministicStrategy(comboStrategy);
+}
+
+/**
+ * Track cache control metrics for a request
+ */
+export function trackCacheMetrics({
+  preserved,
+  provider,
+  strategy,
+  metrics,
+  inputTokens,
+  cachedTokens,
+  cacheCreationTokens,
+}: {
+  preserved: boolean;
+  provider: string;
+  strategy: string | null | undefined;
+  metrics: CacheControlMetrics;
+  inputTokens?: number;
+  cachedTokens?: number;
+  cacheCreationTokens?: number;
+}): CacheControlMetrics {
+  const now = new Date().toISOString();
+
+  // Initialize metrics if empty
+  if (!metrics) {
+    metrics = {
+      totalRequests: 0,
+      requestsWithCacheControl: 0,
+      totalInputTokens: 0,
+      totalCachedTokens: 0,
+      totalCacheCreationTokens: 0,
+      tokensSaved: 0,
+      estimatedCostSaved: 0,
+      byProvider: {},
+      byStrategy: {},
+      lastUpdated: now,
+    };
+  }
+
+  // Increment total requests
+  metrics.totalRequests++;
+
+  // Track token counts
+  const input = inputTokens || 0;
+  const cached = cachedTokens || 0;
+  const creation = cacheCreationTokens || 0;
+
+  metrics.totalInputTokens += input;
+  metrics.totalCachedTokens += cached;
+  metrics.totalCacheCreationTokens += creation;
+
+  // Calculate tokens saved (cached tokens are reused, not charged)
+  if (cached > 0) {
+    metrics.tokensSaved += cached;
+  }
+
+  // Only track requests where cache_control was preserved
+  if (preserved) {
+    metrics.requestsWithCacheControl++;
+
+    // Initialize provider tracking
+    if (!metrics.byProvider[provider]) {
+      metrics.byProvider[provider] = {
+        requests: 0,
+        inputTokens: 0,
+        cachedTokens: 0,
+        cacheCreationTokens: 0,
+      };
+    }
+    metrics.byProvider[provider].requests++;
+    metrics.byProvider[provider].inputTokens += input;
+    metrics.byProvider[provider].cachedTokens += cached;
+    metrics.byProvider[provider].cacheCreationTokens += creation;
+
+    // Initialize strategy tracking
+    if (strategy && !metrics.byStrategy[strategy]) {
+      metrics.byStrategy[strategy] = {
+        requests: 0,
+        inputTokens: 0,
+        cachedTokens: 0,
+        cacheCreationTokens: 0,
+      };
+    }
+    if (strategy) {
+      metrics.byStrategy[strategy].requests++;
+      metrics.byStrategy[strategy].inputTokens += input;
+      metrics.byStrategy[strategy].cachedTokens += cached;
+      metrics.byStrategy[strategy].cacheCreationTokens += creation;
+    }
+  }
+
+  metrics.lastUpdated = now;
+  return metrics;
+}
+
+/**
+ * Record cache token usage and update metrics
+ */
+export function updateCacheTokenMetrics({
+  metrics,
+  provider,
+  strategy,
+  inputTokens,
+  cachedTokens,
+  cacheCreationTokens,
+  costSaved,
+}: {
+  metrics: CacheControlMetrics;
+  provider: string;
+  strategy: string | null | undefined;
+  inputTokens: number;
+  cachedTokens: number;
+  cacheCreationTokens: number;
+  costSaved?: number;
+}): CacheControlMetrics {
+  metrics.totalCachedTokens += cachedTokens;
+  metrics.totalCacheCreationTokens += cacheCreationTokens;
+  metrics.totalInputTokens += inputTokens;
+
+  // Cached tokens are reused (saved), creation tokens are new cache writes
+  metrics.tokensSaved += cachedTokens;
+  if (costSaved !== undefined) {
+    metrics.estimatedCostSaved += costSaved;
+  }
+
+  // Update provider tracking
+  if (metrics.byProvider[provider]) {
+    metrics.byProvider[provider].cachedTokens += cachedTokens;
+    metrics.byProvider[provider].cacheCreationTokens += cacheCreationTokens;
+    metrics.byProvider[provider].inputTokens += inputTokens;
+  }
+
+  // Update strategy tracking
+  if (strategy && metrics.byStrategy[strategy]) {
+    metrics.byStrategy[strategy].cachedTokens += cachedTokens;
+    metrics.byStrategy[strategy].cacheCreationTokens += cacheCreationTokens;
+    metrics.byStrategy[strategy].inputTokens += inputTokens;
+  }
+
+  metrics.lastUpdated = new Date().toISOString();
+  return metrics;
+}
@@ -159,8 +159,9 @@ export function createSSEStream(options: StreamOptions = {}) {

  // Track content length for usage estimation (both modes)
  let totalContentLength = 0;
-  // Passthrough: accumulate content for call log response body
+  // Passthrough: accumulate content and reasoning separately for call log response body
  let passthroughAccumulatedContent = "";
+  let passthroughAccumulatedReasoning = "";

  // Guard against duplicate [DONE] events — ensures exactly one per stream
  let doneSent = false;
@@ -304,6 +305,14 @@ export function createSSEStream(options: StreamOptions = {}) {
                  }
                } else {
                  // Chat Completions: full sanitization pipeline
+
+                  // Detect reasoning alias before sanitization strips it
+                  const hadReasoningAlias = !!(
+                    parsed.choices?.[0]?.delta?.reasoning &&
+                    typeof parsed.choices[0].delta.reasoning === "string" &&
+                    !parsed.choices[0].delta.reasoning_content
+                  );
+
                  parsed = sanitizeStreamingChunk(parsed);

                  const idFixed = fixInvalidId(parsed);
@@ -323,6 +332,31 @@ export function createSSEStream(options: StreamOptions = {}) {
                    }
                  }

+                  // Split combined reasoning+content deltas into separate SSE events.
+                  // Standard OpenAI streaming never mixes both fields in one delta;
+                  // clients (e.g. LobeChat) may skip content when reasoning_content
+                  // is present, causing the first content token to be lost.
+                  if (delta?.reasoning_content && delta?.content) {
+                    const reasoningChunk = JSON.parse(JSON.stringify(parsed));
+                    const rDelta = reasoningChunk.choices[0].delta;
+                    delete rDelta.content;
+                    reasoningChunk.choices[0].finish_reason = null;
+                    delete reasoningChunk.usage;
+                    const rOutput = `data: ${JSON.stringify(reasoningChunk)}\n`;
+                    passthroughAccumulatedReasoning += delta.reasoning_content;
+                    totalContentLength += delta.reasoning_content.length;
+                    clientPayloadCollector.push(reasoningChunk);
+                    reqLogger?.appendConvertedChunk?.(rOutput);
+                    controller.enqueue(encoder.encode(rOutput));
+                    controller.enqueue(encoder.encode("\n"));
+                    delete delta.reasoning_content;
+                  }
+
+                  // Track whether we need to re-serialize (separate from injectedUsage
+                  // to avoid blocking subsequent finish_reason / usage mutations)
+                  const needsReserialization =
+                    hadReasoningAlias || (delta?.content === "" && delta?.reasoning_content);
+
                  // T18: Track if we saw tool calls & accumulate for call log
                  if (delta?.tool_calls && delta.tool_calls.length > 0) {
                    passthroughHasToolCalls = true;
@@ -365,7 +399,7 @@ export function createSSEStream(options: StreamOptions = {}) {
                  if (typeof delta?.content === "string")
                    passthroughAccumulatedContent += delta.content;
                  if (typeof delta?.reasoning_content === "string")
-                    passthroughAccumulatedContent += delta.reasoning_content;
+                    passthroughAccumulatedReasoning += delta.reasoning_content;

                  const extracted = extractUsage(parsed);
                  if (extracted) {
@@ -398,7 +432,7 @@ export function createSSEStream(options: StreamOptions = {}) {
                    parsed.usage = filterUsageForFormat(buffered, FORMATS.OPENAI);
                    output = `data: ${JSON.stringify(parsed)}\n`;
                    injectedUsage = true;
-                  } else if (idFixed) {
+                  } else if (idFixed || needsReserialization) {
                    output = `data: ${JSON.stringify(parsed)}\n`;
                    injectedUsage = true;
                  }
@@ -483,6 +517,19 @@ export function createSSEStream(options: StreamOptions = {}) {
              if (state?.accumulatedContent !== undefined) state.accumulatedContent += r;
            }
          }
+          // Normalize `reasoning` alias → `reasoning_content` (NVIDIA kimi-k2.5 etc.)
+          if (
+            parsed.choices?.[0]?.delta?.reasoning &&
+            !parsed.choices?.[0]?.delta?.reasoning_content
+          ) {
+            const r = parsed.choices[0].delta.reasoning;
+            if (typeof r === "string") {
+              parsed.choices[0].delta.reasoning_content = r;
+              delete parsed.choices[0].delta.reasoning;
+              totalContentLength += r.length;
+              if (state?.accumulatedContent !== undefined) state.accumulatedContent += r;
+            }
+          }

          // Gemini format - may have multiple parts
          if (parsed.candidates?.[0]?.content?.parts) {
@@ -635,6 +682,10 @@ export function createSSEStream(options: StreamOptions = {}) {
                  role: "assistant",
                  content: content || null,
                };
+                const reasoning = passthroughAccumulatedReasoning.trim();
+                if (reasoning) {
+                  message.reasoning_content = reasoning;
+                }
                if (passthroughToolCalls.size > 0) {
                  message.tool_calls = [...passthroughToolCalls.values()].sort(
                    (a, b) => a.index - b.index
@@ -157,6 +157,10 @@ function buildOpenAISummary(events: StructuredSSEEvent[], fallbackModel?: string
    if (typeof delta.reasoning_content === "string" && delta.reasoning_content.length > 0) {
      reasoningParts.push(delta.reasoning_content);
    }
+    // Normalize `reasoning` alias (NVIDIA kimi-k2.5 etc.)
+    if (typeof delta.reasoning === "string" && delta.reasoning.length > 0 && !delta.reasoning_content) {
+      reasoningParts.push(delta.reasoning);
+    }

    if (Array.isArray(delta.tool_calls)) {
      for (const item of delta.tool_calls) {
@@ -203,12 +207,14 @@ function buildOpenAISummary(events: StructuredSSEEvent[], fallbackModel?: string
    }
  }

+  const joinedContent = contentParts.length > 0 ? contentParts.join("").trim() : null;
+  const joinedReasoning = reasoningParts.length > 0 ? reasoningParts.join("").trim() : null;
  const message: JsonRecord = {
    role: "assistant",
-    content: contentParts.length > 0 ? contentParts.join("") : null,
+    content: joinedContent || null,
  };
-  if (reasoningParts.length > 0) {
-    message.reasoning_content = reasoningParts.join("");
+  if (joinedReasoning) {
+    message.reasoning_content = joinedReasoning;
  }

  const finalToolCalls = [...toolCalls.values()].sort((a, b) => a.index - b.index);
@@ -1,12 +1,12 @@
 {
  "name": "omniroute",
-  "version": "3.2.6",
+  "version": "3.3.0",
  "lockfileVersion": 3,
  "requires": true,
  "packages": {
    "": {
      "name": "omniroute",
-      "version": "3.2.6",
+      "version": "3.3.0",
      "hasInstallScript": true,
      "license": "MIT",
      "workspaces": [
@@ -1,6 +1,6 @@
 {
  "name": "omniroute",
-  "version": "3.2.7",
+  "version": "3.3.0",
  "description": "Smart AI Router with auto fallback — route to FREE & cheap models, zero downtime. Works with Cursor, Cline, Claude Desktop, Codex, and any OpenAI-compatible tool.",
  "type": "module",
  "bin": {
@@ -0,0 +1,17 @@
+## [3.2.8] - 2026-03-29
+
+### ✨ Enhancements & Refactoring
+
+- **Docker Auto-Update UI** — Integrated a detached background update process for Docker Compose deployments. The Dashboard UI now seamlessly tracks update lifecycle events combining JSON REST responses with SSE streaming progress overlays for robust cross-environment reliability.
+- **Cache Analytics** — Repaired zero-metrics visualization mapping by migrating Semantic Cache telemetry logs directly into the centralized tracking SQLite module.
+
+### 🐛 Bug Fixes
+
+- **Authentication Logic** — Fixed a bug where saving dashboard settings or adding models failed with a 401 Unauthorized error when `requireLogin` was disabled. API endpoints now correctly evaluate the global authentication toggle. Resolved global redirection by reactivating `src/middleware.ts`.
+- **CLI Tool Detection (Windows)** — Prevented fatal initialization exceptions during CLI environment detection by catching `cross-spawn` ENOENT errors correctly. Adds explicit detection paths for `\AppData\Local\droid\droid.exe`.
+- **Codex Native Passthrough** — Normalized model translation parameters preventing context poisoning in proxy pass-through mode, enforcing generic `store: false` constraints explicitly for all Codex-originated requests.
+- **SSE Token Reporting** — Normalized provider tool-call chunk `finish_reason` detection, fixing 0% Usage analytics for stream-only responses missing strict `<DONE>` indicators.
+- **DeepSeek <think> Tags** — Implemented an explicit `<think>` extraction mapping inside `responsesHandler.ts`, ensuring DeepSeek reasoning streams map equivalently to native Anthropic `<thinking>` structures.
+
+---
+
@@ -0,0 +1,151 @@
+#!/usr/bin/env python3
+"""
+OmniRoute i18n Auto-Translator
+This script scans all docs/i18n directory markdown files and uses an LLM
+API (like OmniRoute itself) to translate any English paragraphs into the 
+target language.
+
+Usage:
+  python3 scripts/i18n_autotranslate.py --api-url http://192.168.0.15:20128/v1 --api-key sk-14e76c286e84ff2d-agn73z-5a1fd283 --model cx/gpt-5.4
+"""
+
+import os
+import re
+import sys
+import glob
+import json
+import urllib.request
+import urllib.error
+import argparse
+from pathlib import Path
+
+# The base path of the project
+SCRIPT_DIR = Path(__file__).parent.resolve()
+PROJECT_ROOT = SCRIPT_DIR.parent
+I18N_DIR = PROJECT_ROOT / "docs" / "i18n"
+
+def get_language_name(lang_code):
+    lang_map = {
+        "pt-BR": "Portuguese (Brazil)", "es": "Spanish", "fr": "French", 
+        "it": "Italian", "ru": "Russian", "zh-CN": "Simplified Chinese", 
+        "de": "German", "in": "Hindi", "th": "Thai", "uk-UA": "Ukrainian", 
+        "ar": "Arabic", "ja": "Japanese", "vi": "Vietnamese", "bg": "Bulgarian", 
+        "da": "Danish", "fi": "Finnish", "he": "Hebrew", "hu": "Hungarian", 
+        "id": "Indonesian", "ko": "Korean", "ms": "Malay", "nl": "Dutch", 
+        "no": "Norwegian", "pt": "Portuguese (Portugal)", "ro": "Romanian", 
+        "pl": "Polish", "sk": "Slovak", "sv": "Swedish", "phi": "Filipino", 
+        "cs": "Czech"
+    }
+    return lang_map.get(lang_code, lang_code)
+
+def translate_block(text, target_language, api_url, api_key, model):
+    if not text.strip():
+        return text
+
+    prompt = (
+        f"You are a professional technical translator working on the OmniRoute proxy project documentation.\n"
+        f"Translate the following Markdown text from English to {target_language}.\n"
+        f"CRITICAL RULES:\n"
+        f"- Do NOT translate code blocks (```...```).\n"
+        f"- Do NOT translate markdown formatting elements, links syntax, or image syntax.\n"
+        f"- Retain formatting perfectly.\n"
+        f"- Only return the translated text without introductory phrases.\n\n"
+        f"{text}"
+    )
+
+    data = {
+        "model": model,
+        "messages": [
+            {"role": "system", "content": "You are a direct translator. Output only the requested translation."},
+            {"role": "user", "content": prompt}
+        ],
+        "temperature": 0.3,
+        "stream": False
+    }
+    
+    req = urllib.request.Request(
+        f"{api_url}/chat/completions",
+        data=json.dumps(data).encode('utf-8'),
+        headers={
+            "Content-Type": "application/json",
+            "Authorization": f"Bearer {api_key}"
+        }
+    )
+    
+    try:
+        with urllib.request.urlopen(req) as response:
+            result = json.loads(response.read().decode())
+            if "choices" in result and len(result["choices"]) > 0:
+                translated = result["choices"][0]["message"]["content"]
+                return translated.strip()
+    except Exception as e:
+        print(f"    ❌ API Error: {e}")
+        return text
+
+def process_file(file_path, target_language, api_url, api_key, model):
+    with open(file_path, 'r', encoding='utf-8') as f:
+        content = f.read()
+
+    # Simple heuristic: we look for English common words to identify if a block needs translation.
+    # A true robust implementation would diff against the English source.
+    # For now, we split by double newlines (markdown blocks)
+    blocks = content.split('\n\n')
+    translated_blocks = []
+    
+    english_words = [" the ", " is ", " are ", " this ", " that ", " a ", " to "]
+    
+    needs_update = False
+    
+    for block in blocks:
+        # Skip translation if it's a pure code block or doesn't have English markers
+        if block.startswith('```') or block.startswith('<div') or block.startswith('🌐') or block.startswith('|'):
+            translated_blocks.append(block)
+            continue
+            
+        is_english = any(w in block.lower() for w in english_words)
+        
+        if is_english and len(block.strip()) > 10:
+            print(f"    🔄 Translating paragraph (length {len(block)})...")
+            new_block = translate_block(block, target_language, api_url, api_key, model)
+            if new_block != block:
+                needs_update = True
+            translated_blocks.append(new_block)
+        else:
+            translated_blocks.append(block)
+            
+    if needs_update:
+        with open(file_path, 'w', encoding='utf-8') as f:
+            f.write('\n\n'.join(translated_blocks))
+        print(f"  ✅ Updated translations in {file_path.name}")
+    else:
+        print(f"  ⏩ {file_path.name} already fully translated or no English blocks found.")
+
+def main():
+    parser = argparse.ArgumentParser(description="OmniRoute Auto-Translator for i18n Markdown")
+    parser.add_argument("--api-url", default="http://localhost:20128/v1", help="Base URL of OmniRoute or target provider")
+    parser.add_argument("--api-key", default="sk-test", help="API Key for the provider")
+    parser.add_argument("--model", default="gc/gemini-3-flash", help="Model name to use")
+    parser.add_argument("--lang", default=None, help="Process only a specific language code (e.g. pt-BR)")
+    
+    args = parser.parse_args()
+    
+    print(f"🚀 Starting Auto-Translator")
+    print(f"🔗 Target API: {args.api_url} | Model: {args.model}\n")
+    
+    if args.lang:
+        lang_dirs = [d for d in I18N_DIR.iterdir() if d.is_dir() and d.name == args.lang]
+    else:
+        lang_dirs = [d for d in I18N_DIR.iterdir() if d.is_dir()]
+    
+    for lang_dir in lang_dirs:
+        lang_code = lang_dir.name
+        lang_name = get_language_name(lang_code)
+        
+        print(f"\n🌍 Processing {lang_name} ({lang_code})")
+        
+        md_files = list(lang_dir.glob("*.md"))
+        for md_file in md_files:
+            process_file(md_file, lang_name, args.api_url, args.api_key, args.model)
+            
+if __name__ == "__main__":
+    main()
@@ -13,6 +13,25 @@ import { AI_PROVIDERS, FREE_PROVIDERS, OAUTH_PROVIDERS } from "@/shared/constant
 import { useNotificationStore } from "@/store/notificationStore";
 import { copyToClipboard } from "@/shared/utils/clipboard";

+type UpdateStep = {
+  step: string;
+  status: string;
+  message: string;
+};
+
+const wait = (ms: number) => new Promise((resolve) => setTimeout(resolve, ms));
+
+function mergeUpdateStep(steps: UpdateStep[], nextStep: UpdateStep) {
+  const idx = steps.findIndex((step) => step.step === nextStep.step);
+  if (idx === -1) {
+    return [...steps, nextStep];
+  }
+
+  const next = [...steps];
+  next[idx] = nextStep;
+  return next;
+}
+
 export default function HomePageClient({ machineId }) {
  const t = useTranslations("home");
  const tc = useTranslations("common");
@@ -26,9 +45,7 @@ export default function HomePageClient({ machineId }) {

  const [versionInfo, setVersionInfo] = useState<any>(null);
  const [updating, setUpdating] = useState(false);
-  const [updateSteps, setUpdateSteps] = useState<
-    Array<{ step: string; status: string; message: string }>
-  >([]);
+  const [updateSteps, setUpdateSteps] = useState<UpdateStep[]>([]);
  const [updatePhase, setUpdatePhase] = useState<"idle" | "running" | "done" | "failed">("idle");

  useEffect(() => {
@@ -134,6 +151,155 @@ export default function HomePageClient({ machineId }) {
    },
  ];

+  const pollBackgroundUpdate = useCallback(
+    async ({
+      channel,
+      message,
+      targetVersion,
+    }: {
+      channel: string;
+      message: string;
+      targetVersion: string;
+    }) => {
+      const notify = useNotificationStore.getState();
+      const initialSteps =
+        channel === "docker-compose"
+          ? [
+              {
+                step: "install",
+                status: "done",
+                message: message || `Queued update to v${targetVersion}.`,
+              },
+              {
+                step: "rebuild",
+                status: "running",
+                message: "Docker image is rebuilding in the background.",
+              },
+              {
+                step: "restart",
+                status: "pending",
+                message: "Waiting for OmniRoute to restart with the new version.",
+              },
+            ]
+          : [
+              {
+                step: "install",
+                status: "running",
+                message: message || `Installing v${targetVersion}.`,
+              },
+              {
+                step: "restart",
+                status: "pending",
+                message: "Waiting for OmniRoute to restart with the new version.",
+              },
+            ];
+
+      setUpdateSteps(initialSteps);
+
+      const maxAttempts = channel === "docker-compose" ? 72 : 36;
+
+      for (let attempt = 0; attempt < maxAttempts; attempt += 1) {
+        await wait(5000);
+
+        try {
+          const versionRes = await fetch("/api/system/version", { cache: "no-store" });
+          if (!versionRes.ok) {
+            throw new Error(`Version check returned ${versionRes.status}`);
+          }
+
+          const latestInfo = await versionRes.json();
+          setVersionInfo(latestInfo);
+
+          if (latestInfo.current === targetVersion) {
+            setUpdateSteps((prev) => {
+              let next = prev.map((step) => {
+                if (step.step === "install" || step.step === "rebuild" || step.step === "restart") {
+                  return { ...step, status: "done" };
+                }
+                return step;
+              });
+
+              next = mergeUpdateStep(next, {
+                step: "complete",
+                status: "done",
+                message: `OmniRoute is now running v${targetVersion}.`,
+              });
+
+              return next;
+            });
+            setUpdating(false);
+            setUpdatePhase("done");
+            notify.success(`OmniRoute updated to v${targetVersion}.`);
+            await fetchData();
+            return;
+          }
+
+          setUpdateSteps((prev) => {
+            let next = prev;
+            if (channel === "docker-compose") {
+              next = mergeUpdateStep(next, {
+                step: "rebuild",
+                status: "running",
+                message: `Docker image is still rebuilding for v${targetVersion}.`,
+              });
+            } else {
+              next = mergeUpdateStep(next, {
+                step: "install",
+                status: "running",
+                message: `Installing v${targetVersion} in the background.`,
+              });
+            }
+
+            next = mergeUpdateStep(next, {
+              step: "restart",
+              status: "pending",
+              message: `Waiting for OmniRoute to come back on v${targetVersion}.`,
+            });
+
+            return next;
+          });
+        } catch {
+          setUpdateSteps((prev) => {
+            let next = prev;
+            if (channel === "docker-compose") {
+              next = mergeUpdateStep(next, {
+                step: "rebuild",
+                status: "running",
+                message: "Docker rebuild is still in progress.",
+              });
+            } else {
+              next = mergeUpdateStep(next, {
+                step: "install",
+                status: "running",
+                message: `Installing v${targetVersion} in the background.`,
+              });
+            }
+
+            next = mergeUpdateStep(next, {
+              step: "restart",
+              status: "running",
+              message: "Service restart in progress. Waiting for OmniRoute to come back online...",
+            });
+
+            return next;
+          });
+        }
+      }
+
+      setUpdateSteps((prev) =>
+        mergeUpdateStep(prev, {
+          step: "error",
+          status: "failed",
+          message: `Update started, but v${targetVersion} did not become available before timeout. Refresh the page or check server logs.`,
+        })
+      );
+      setUpdating(false);
+      setUpdatePhase("failed");
+      notify.error(`Update to v${targetVersion} timed out.`);
+    },
+    [fetchData]
+  );
+
  const handleUpdate = async () => {
    const notify = useNotificationStore.getState();
    setUpdating(true);
@@ -153,6 +319,13 @@ export default function HomePageClient({ machineId }) {
          setUpdatePhase("idle");
          return;
        }
+        notify.success(data.message || "Update started.");
+        await pollBackgroundUpdate({
+          channel: data.channel || "docker-compose",
+          message: data.message || "",
+          targetVersion: data.to || data.latest,
+        });
+        return;
      }

      // SSE stream — read progress events
@@ -181,18 +354,12 @@ export default function HomePageClient({ machineId }) {
            const event = JSON.parse(line.slice(6));

            setUpdateSteps((prev) => {
-              // Replace existing step entry or add new one
-              const idx = prev.findIndex((s) => s.step === event.step);
-              if (idx >= 0) {
-                const next = [...prev];
-                next[idx] = event;
-                return next;
-              }
-              return [...prev, event];
+              return mergeUpdateStep(prev, event);
            });

            if (event.step === "complete") {
              setUpdatePhase("done");
+              setUpdating(false);
              notify.success(event.message || "Update complete!");
            } else if (event.step === "error") {
              setUpdatePhase("failed");
@@ -242,6 +409,7 @@ export default function HomePageClient({ machineId }) {
    complete: "Complete",
    error: "Error",
  };
+  const showUpdateOverlay = updatePhase !== "idle";

  if (loading) {
    return (
@@ -257,7 +425,7 @@ export default function HomePageClient({ machineId }) {
  return (
    <div className="flex flex-col gap-8">
      {/* Update Progress Overlay */}
-      {updating && (
+      {showUpdateOverlay && (
        <div className="fixed inset-0 z-[999] bg-black/60 backdrop-blur-sm flex items-center justify-center p-4">
          <div className="bg-bg-main border border-border rounded-2xl shadow-2xl max-w-md w-full p-6">
            <div className="flex items-center gap-3 mb-5">
@@ -371,7 +539,7 @@ export default function HomePageClient({ machineId }) {
      )}

      {/* Update Notification Banner */}
-      {versionInfo?.updateAvailable && !updating && (
+      {versionInfo?.updateAvailable && !showUpdateOverlay && (
        <div className="bg-primary/10 border border-primary/20 text-primary px-5 py-4 rounded-xl flex items-center justify-between min-h-[64px]">
          <div className="flex items-center gap-4">
            <span className="material-symbols-outlined text-[24px]">system_update_alt</span>
@@ -186,6 +186,9 @@ const COMBO_TEMPLATE_FALLBACK = {
  freeStackTitle: "Free Stack ($0)",
  freeStackDesc:
    "Round-robin across all free providers: Kiro, iFlow, Qwen, Gemini CLI. Zero cost, never stops.",
+  paidPremiumTitle: "Paid Premium",
+  paidPremiumDesc:
+    "Round-robin across paid subscriptions: Cursor, Antigravity. Top-tier models, distributed load.",
 };

 const COMBO_TEMPLATES = [
@@ -250,6 +253,21 @@ const COMBO_TEMPLATES = [
      healthCheckEnabled: true,
    },
  },
+  {
+    id: "paid-premium",
+    icon: "workspace_premium",
+    titleKey: "templatePaidPremium",
+    descKey: "templatePaidPremiumDesc",
+    fallbackTitle: COMBO_TEMPLATE_FALLBACK.paidPremiumTitle,
+    fallbackDesc: COMBO_TEMPLATE_FALLBACK.paidPremiumDesc,
+    strategy: "round-robin",
+    suggestedName: "paid-premium",
+    config: {
+      maxRetries: 2,
+      retryDelayMs: 1000,
+      healthCheckEnabled: true,
+    },
+  },
 ];

 function getStrategyMeta(strategy) {
@@ -1425,18 +1443,27 @@ function ComboFormModal({ isOpen, combo, onClose, onSave, activeProviders }) {
    { model: "kr/claude-sonnet-4.5", weight: 0 },
    { model: "if/kimi-k2-thinking", weight: 0 },
    { model: "if/qwen3-coder-plus", weight: 0 },
-    { model: "qw/qwen3-coder-plus", weight: 0 },
+    { model: "if/deepseek-v3.2", weight: 0 },
    { model: "nvidia/llama-3.3-70b-instruct", weight: 0 },
    { model: "groq/llama-3.3-70b-versatile", weight: 0 },
  ];

+  const PAID_PREMIUM_PRESET_MODELS = [
+    { model: "cu/claude-4.6-opus-high", weight: 0 },
+    { model: "ag/claude-sonnet-4-6", weight: 0 },
+    { model: "cu/claude-4.6-sonnet-high", weight: 0 },
+    { model: "ag/gpt-5", weight: 0 },
+    { model: "ag/gemini-3.1-pro-preview", weight: 0 },
+  ];
+
  const applyTemplate = (template) => {
    setStrategy(template.strategy);
    setConfig((prev) => ({ ...prev, ...template.config }));
    if (!name.trim()) setName(template.suggestedName);
-    // Pre-fill Free Stack with 7 real free provider models
    if (template.id === "free-stack") {
      setModels(FREE_STACK_PRESET_MODELS);
+    } else if (template.id === "paid-premium") {
+      setModels(PAID_PREMIUM_PRESET_MODELS);
    }
  };

@@ -4,69 +4,190 @@ import { useState, useEffect } from "react";
 import { Card } from "@/shared/components";
 import { useTranslations } from "next-intl";

+interface CacheMetrics {
+  totalRequests: number;
+  requestsWithCacheControl: number;
+  totalInputTokens: number;
+  totalCachedTokens: number;
+  totalCacheCreationTokens: number;
+  tokensSaved: number;
+  estimatedCostSaved: number;
+  byProvider: Record<
+    string,
+    {
+      requests: number;
+      inputTokens: number;
+      cachedTokens: number;
+      cacheCreationTokens: number;
+    }
+  >;
+  byStrategy: Record<
+    string,
+    {
+      requests: number;
+      inputTokens: number;
+      cachedTokens: number;
+      cacheCreationTokens: number;
+    }
+  >;
+  lastUpdated: string;
+}
+
 export default function CacheStatsCard() {
-  const [cache, setCache] = useState(null);
-  const [flushing, setFlushing] = useState(false);
+  const [metrics, setMetrics] = useState<CacheMetrics | null>(null);
+  const [resetting, setResetting] = useState(false);
  const t = useTranslations("settings");

-  const fetchStats = () => {
-    fetch("/api/cache/stats")
+  const fetchMetrics = () => {
+    fetch("/api/settings/cache-metrics")
      .then((r) => r.json())
-      .then(setCache)
+      .then(setMetrics)
      .catch(() => {});
  };

-  useEffect(fetchStats, []);
+  useEffect(fetchMetrics, []);

-  const handleFlush = async () => {
-    setFlushing(true);
+  const handleReset = async () => {
+    setResetting(true);
    try {
-      await fetch("/api/cache/stats", { method: "DELETE" });
-      fetchStats();
+      await fetch("/api/settings/cache-metrics", { method: "DELETE" });
+      fetchMetrics();
    } finally {
-      setFlushing(false);
+      setResetting(false);
    }
  };

+  const cacheHitRate =
+    metrics && metrics.totalInputTokens > 0
+      ? (metrics.totalCachedTokens / metrics.totalInputTokens) * 100
+      : 0;
+
  return (
    <Card className="p-6">
      <div className="flex items-center justify-between mb-4">
        <h3 className="text-lg font-semibold text-text-main flex items-center gap-2">
-          <span className="material-symbols-outlined text-[20px]">cached</span>
-          {t("promptCache")}
+          <span className="material-symbols-outlined text-[20px]">insights</span>
+          Prompt Cache Metrics
        </h3>
        <button
-          onClick={handleFlush}
-          disabled={flushing}
+          onClick={handleReset}
+          disabled={resetting}
          className="px-3 py-1.5 text-xs rounded-lg bg-red-500/10 text-red-400 hover:bg-red-500/20 transition-colors disabled:opacity-50"
        >
-          {flushing ? t("flushing") : t("flushCache")}
+          {resetting ? "Resetting..." : "Reset Metrics"}
        </button>
      </div>

-      {cache ? (
-        <div className="grid grid-cols-2 gap-4 text-sm">
-          <div>
-            <p className="text-text-muted">{t("size")}</p>
-            <p className="font-mono text-lg text-text-main">
-              {cache.size}/{cache.maxSize}
-            </p>
+      {metrics ? (
+        <div className="space-y-4">
+          {/* Overview Stats */}
+          <div className="grid grid-cols-2 gap-4 text-sm">
+            <div>
+              <p className="text-text-muted">Total Requests</p>
+              <p className="font-mono text-lg text-text-main">{metrics.totalRequests}</p>
+            </div>
+            <div>
+              <p className="text-text-muted">With Cache Control</p>
+              <p className="font-mono text-lg text-text-main">{metrics.requestsWithCacheControl}</p>
+            </div>
          </div>
-          <div>
-            <p className="text-text-muted">{t("hitRate")}</p>
-            <p className="font-mono text-lg text-text-main">{cache.hitRate?.toFixed(1) ?? 0}%</p>
+
+          {/* Token Stats */}
+          <div className="grid grid-cols-3 gap-4 text-sm">
+            <div>
+              <p className="text-text-muted">Input Tokens</p>
+              <p className="font-mono text-lg text-text-main">
+                {metrics.totalInputTokens.toLocaleString()}
+              </p>
+            </div>
+            <div>
+              <p className="text-text-muted">Cached Tokens (Read)</p>
+              <p className="font-mono text-lg text-green-400">
+                {metrics.totalCachedTokens.toLocaleString()}
+              </p>
+            </div>
+            <div>
+              <p className="text-text-muted">Cache Creation (Write)</p>
+              <p className="font-mono text-lg text-blue-400">
+                {metrics.totalCacheCreationTokens.toLocaleString()}
+              </p>
+            </div>
          </div>
-          <div>
-            <p className="text-text-muted">{t("hits")}</p>
-            <p className="font-mono text-text-main">{cache.hits ?? 0}</p>
+
+          {/* Cache Ratio */}
+          <div className="rounded-lg bg-surface/50 border border-border/30 p-3">
+            <div className="flex items-center justify-between">
+              <div>
+                <p className="text-sm font-medium text-text-main">Cache Reuse Ratio</p>
+                <p className="text-xs text-text-muted">Cached tokens / Total input tokens</p>
+              </div>
+              <p className="font-mono text-xl text-green-400">{cacheHitRate.toFixed(1)}%</p>
+            </div>
+            {/* Progress bar */}
+            <div className="mt-2 h-2 rounded-full bg-border/30 overflow-hidden">
+              <div
+                className="h-full bg-green-500 transition-all duration-300"
+                style={{ width: `${Math.min(cacheHitRate, 100)}%` }}
+              />
+            </div>
          </div>
-          <div>
-            <p className="text-text-muted">{t("evictions")}</p>
-            <p className="font-mono text-text-main">{cache.evictions ?? 0}</p>
+
+          {/* Savings */}
+          <div className="grid grid-cols-2 gap-4 text-sm">
+            <div>
+              <p className="text-text-muted">Tokens Saved</p>
+              <p className="font-mono text-lg text-green-400">
+                {metrics.tokensSaved.toLocaleString()}
+              </p>
+            </div>
+            <div>
+              <p className="text-text-muted">Est. Cost Saved</p>
+              <p className="font-mono text-lg text-green-400">
+                ${metrics.estimatedCostSaved.toFixed(4)}
+              </p>
+            </div>
          </div>
+
+          {/* By Provider */}
+          {Object.keys(metrics.byProvider).length > 0 && (
+            <div className="pt-3 border-t border-border/30">
+              <p className="text-xs font-medium text-text-muted mb-2">By Provider</p>
+              <div className="space-y-2">
+                {Object.entries(metrics.byProvider).map(([provider, stats]) => {
+                  const providerCacheRate =
+                    stats.inputTokens > 0 ? (stats.cachedTokens / stats.inputTokens) * 100 : 0;
+                  return (
+                    <div
+                      key={provider}
+                      className="flex items-center justify-between px-3 py-2 rounded bg-surface/30 text-xs"
+                    >
+                      <div className="flex items-center gap-3">
+                        <span className="text-text-main capitalize w-24">{provider}</span>
+                        <span className="text-text-muted">{stats.requests} reqs</span>
+                      </div>
+                      <div className="flex items-center gap-4 font-mono">
+                        <span className="text-text-muted" title="Input tokens">
+                          In: {stats.inputTokens.toLocaleString()}
+                        </span>
+                        <span className="text-green-400" title="Cached tokens (reads)">
+                          Cached: {stats.cachedTokens.toLocaleString()}
+                        </span>
+                        <span className="text-blue-400" title="Cache creation tokens (writes)">
+                          Write: {stats.cacheCreationTokens.toLocaleString()}
+                        </span>
+                        <span className="text-green-400 w-12 text-right">
+                          {providerCacheRate.toFixed(0)}%
+                        </span>
+                      </div>
+                    </div>
+                  );
+                })}
+              </div>
+            </div>
+          )}
        </div>
      ) : (
-        <p className="text-sm text-text-muted">{t("loadingCacheStats")}</p>
+        <p className="text-sm text-text-muted">Loading cache metrics...</p>
      )}
    </Card>
  );
@@ -19,7 +19,10 @@ const STRATEGIES = ROUTING_STRATEGIES.filter((strategy) =>
 }));

 export default function RoutingTab() {
-  const [settings, setSettings] = useState<any>({ fallbackStrategy: "fill-first" });
+  const [settings, setSettings] = useState<any>({
+    fallbackStrategy: "fill-first",
+    alwaysPreserveClientCache: "auto",
+  });
  const [loading, setLoading] = useState(true);
  const [aliases, setAliases] = useState([]);
  const [newPattern, setNewPattern] = useState("");
@@ -218,6 +221,74 @@ export default function RoutingTab() {

      {/* Fallback Chains */}
      <FallbackChainsEditor />
+
+      {/* Client Cache Control */}
+      <Card>
+        <div className="flex items-center gap-3 mb-4">
+          <div className="p-2 rounded-lg bg-green-500/10 text-green-500">
+            <span className="material-symbols-outlined text-[20px]" aria-hidden="true">
+              cached
+            </span>
+          </div>
+          <div>
+            <h3 className="text-lg font-semibold">Client Cache Control</h3>
+            <p className="text-sm text-text-muted">
+              Configure how client-side cache_control headers are handled
+            </p>
+          </div>
+        </div>
+
+        <div className="space-y-3">
+          {[
+            {
+              value: "auto",
+              label: "Auto (Recommended)",
+              desc: "Preserve cache_control only for caching-aware clients (Claude Code) with deterministic routing",
+            },
+            {
+              value: "always",
+              label: "Always Preserve",
+              desc: "Always forward client cache_control headers to upstream providers",
+            },
+            {
+              value: "never",
+              label: "Never Preserve",
+              desc: "Always remove client cache_control headers, let OmniRoute manage caching",
+            },
+          ].map((option) => (
+            <button
+              key={option.value}
+              onClick={() => updateSetting({ alwaysPreserveClientCache: option.value })}
+              disabled={loading}
+              className={`w-full flex flex-col items-start gap-1 p-3 rounded-lg border text-left transition-all ${
+                settings.alwaysPreserveClientCache === option.value
+                  ? "border-green-500/50 bg-green-500/5 ring-1 ring-green-500/20"
+                  : "border-border/50 hover:border-border hover:bg-surface/30"
+              }`}
+            >
+              <div className="flex items-center gap-2">
+                <span
+                  className={`material-symbols-outlined text-[16px] ${
+                    settings.alwaysPreserveClientCache === option.value
+                      ? "text-green-400"
+                      : "text-text-muted"
+                  }`}
+                >
+                  {settings.alwaysPreserveClientCache === option.value
+                    ? "check_circle"
+                    : "radio_button_unchecked"}
+                </span>
+                <span
+                  className={`text-sm font-medium ${settings.alwaysPreserveClientCache === option.value ? "text-green-400" : ""}`}
+                >
+                  {option.label}
+                </span>
+              </div>
+              <p className="text-xs text-text-muted ml-7">{option.desc}</p>
+            </button>
+          ))}
+        </div>
+      </Card>
    </div>
  );
 }
@@ -98,7 +98,10 @@ export async function GET() {

    await Promise.all(
      settingsTools.map(async (toolId) => {
-        if (!statuses[toolId]?.installed || !statuses[toolId]?.runnable) {
+        if (!statuses[toolId]) {
+          return;
+        }
+        if (!statuses[toolId].installed || !statuses[toolId].runnable) {
          statuses[toolId].configStatus = "not_installed";
          return;
        }
@@ -0,0 +1,57 @@
+import { NextResponse } from "next/server";
+import { getSettings, updateSettings } from "@/lib/localDb";
+import { updateAutoDisableAccountsSchema } from "@/shared/validation/schemas";
+import { isValidationFailure, validateBody } from "@/shared/validation/helpers";
+
+export async function GET() {
+  try {
+    const settings = await getSettings();
+    return NextResponse.json({
+      enabled: settings.autoDisableBannedAccounts ?? false,
+      threshold: settings.autoDisableBannedThreshold ?? 3,
+    });
+  } catch (error) {
+    console.error("Error reading auto-disable accounts config:", error);
+    return NextResponse.json(
+      { error: "Failed to read auto-disable accounts config" },
+      { status: 500 }
+    );
+  }
+}
+
+export async function PUT(request: Request) {
+  let rawBody: unknown;
+  try {
+    rawBody = await request.json();
+  } catch {
+    return NextResponse.json(
+      { error: { message: "Invalid request", details: [{ field: "body", message: "Invalid JSON body" }] } },
+      { status: 400 }
+    );
+  }
+
+  try {
+    const validation = validateBody(updateAutoDisableAccountsSchema, rawBody);
+    if (isValidationFailure(validation)) {
+      return NextResponse.json({ error: validation.error }, { status: 400 });
+    }
+    const body = validation.data;
+
+    await updateSettings({
+      autoDisableBannedAccounts: body.enabled,
+      ...(body.threshold !== undefined && { autoDisableBannedThreshold: body.threshold }),
+    });
+
+    const settings = await getSettings();
+    return NextResponse.json({
+      enabled: settings.autoDisableBannedAccounts ?? false,
+      threshold: settings.autoDisableBannedThreshold ?? 3,
+    });
+  } catch (error) {
+    console.error("Error updating auto-disable accounts config:", error);
+    return NextResponse.json(
+      { error: "Failed to update auto-disable accounts config" },
+      { status: 500 }
+    );
+  }
+}
@@ -0,0 +1,22 @@
+import { NextResponse } from "next/server";
+import { getCacheMetrics, resetCacheMetrics } from "@/lib/db/settings";
+
+export async function GET() {
+  try {
+    const metrics = await getCacheMetrics();
+    return NextResponse.json(metrics);
+  } catch (error) {
+    console.error("Error getting cache metrics:", error);
+    return NextResponse.json({ error: "Failed to load cache metrics" }, { status: 500 });
+  }
+}
+
+export async function DELETE() {
+  try {
+    const metrics = await resetCacheMetrics();
+    return NextResponse.json(metrics);
+  } catch (error) {
+    console.error("Error resetting cache metrics:", error);
+    return NextResponse.json({ error: "Failed to reset cache metrics" }, { status: 500 });
+  }
+}
@@ -119,6 +119,12 @@ export async function PATCH(request) {
      invalidateCallLogsMaxCache();
    }

+    // Sync cache control settings to runtime cache
+    if ("alwaysPreserveClientCache" in body) {
+      const { invalidateCacheControlSettingsCache } = await import("@/lib/cacheControlSettings");
+      invalidateCacheControlSettingsCache();
+    }
+
    const { password, ...safeSettings } = settings;
    return NextResponse.json(safeSettings);
  } catch (error) {
@@ -1,6 +1,6 @@
 /**
 * GET  /api/system/version  — Returns current version and latest available on npm
- * POST /api/system/update   — Triggers npm install -g omniroute@latest + pm2 restart
+ * POST /api/system/version  — Triggers a deployment-aware background update
 *
 * Security: Requires admin authentication (same as other management routes).
 * Safety: Update only runs if a newer version is available on npm.
@@ -9,12 +9,16 @@ import { NextRequest, NextResponse } from "next/server";
 import { execFile } from "child_process";
 import { promisify } from "util";
 import { isAuthenticated } from "@/shared/utils/apiAuth";
+import {
+  getAutoUpdateConfig,
+  launchAutoUpdate,
+  validateAutoUpdateRuntime,
+} from "@/lib/system/autoUpdate";

 const execFileAsync = promisify(execFile);

 export const dynamic = "force-dynamic";

-/** Fetch latest version from npm registry (no install, just metadata) */
 async function getLatestNpmVersion(): Promise<string | null> {
  try {
    const { stdout } = await execFileAsync("npm", ["info", "omniroute", "version", "--json"], {
@@ -27,7 +31,6 @@ async function getLatestNpmVersion(): Promise<string | null> {
  }
 }

-/** Current installed version from package.json */
 function getCurrentVersion(): string {
  try {
    return require("../../../../../package.json").version as string;
@@ -36,7 +39,6 @@ function getCurrentVersion(): string {
  }
 }

-/** Compare semver strings — returns true if a > b */
 function isNewer(a: string | null, b: string): boolean {
  if (!a) return false;
  const parse = (v: string) => v.split(".").map(Number);
@@ -48,24 +50,28 @@ function isNewer(a: string | null, b: string): boolean {
 }

 export async function GET(req: NextRequest) {
-  if (!isAuthenticated(req)) {
+  if (!(await isAuthenticated(req))) {
    return NextResponse.json({ error: "Unauthorized" }, { status: 401 });
  }

  const current = getCurrentVersion();
  const latest = await getLatestNpmVersion();
  const updateAvailable = isNewer(latest, current);
+  const config = getAutoUpdateConfig();
+  const validation = await validateAutoUpdateRuntime(config);

  return NextResponse.json({
    current,
    latest: latest ?? "unavailable",
    updateAvailable,
-    channel: "npm",
+    channel: config.mode,
+    autoUpdateSupported: validation.supported,
+    autoUpdateError: validation.reason,
  });
 }

 export async function POST(req: NextRequest) {
-  if (!isAuthenticated(req)) {
+  if (!(await isAuthenticated(req))) {
    return NextResponse.json({ error: "Unauthorized" }, { status: 401 });
  }

@@ -88,7 +94,34 @@ export async function POST(req: NextRequest) {
    });
  }

-  // Stream progress events so the frontend can show real-time status
+  const config = getAutoUpdateConfig();
+
+  // If we are in docker-compose mode, use the detached shell script background updates
+  if (config.mode === "docker-compose") {
+    const launched = await launchAutoUpdate({ latest });
+    if (!launched.started) {
+      return NextResponse.json(
+        {
+          success: false,
+          error: launched.error || "Failed to start auto-update.",
+          channel: launched.channel,
+          logPath: launched.logPath,
+        },
+        { status: 503 }
+      );
+    }
+
+    return NextResponse.json({
+      success: true,
+      message: `Update to v${latest} started. Docker rebuild is running in the background.`,
+      from: current,
+      to: latest,
+      channel: launched.channel,
+      logPath: launched.logPath,
+    });
+  }
+
+  // Stream progress events so the frontend can show real-time status for NPM/PM2 mode
  const encoder = new TextEncoder();
  const stream = new ReadableStream({
    async start(controller) {
@@ -14,7 +14,7 @@ import {
  type EmbeddingProviderNodeRow,
  type EmbeddingProvider,
 } from "@omniroute/open-sse/config/embeddingRegistry.ts";
-import { errorResponse } from "@omniroute/open-sse/utils/error.ts";
+import { errorResponse, unavailableResponse } from "@omniroute/open-sse/utils/error.ts";
 import { HTTP_STATUS } from "@omniroute/open-sse/config/constants.ts";
 import * as log from "@/sse/utils/logger";
 import { toJsonErrorPayload } from "@/shared/utils/upstreamError";
@@ -209,6 +209,14 @@ export async function POST(request) {
        `No credentials for embedding provider: ${provider}`
      );
    }
+    if (credentials.allRateLimited) {
+      return unavailableResponse(
+        HTTP_STATUS.RATE_LIMITED,
+        `[${provider}] All accounts rate limited`,
+        credentials.retryAfter,
+        credentials.retryAfterHuman
+      );
+    }
  }

  const result = await handleEmbedding({
@@ -11,7 +11,7 @@ import {
  getAllImageModels,
  getImageProvider,
 } from "@omniroute/open-sse/config/imageRegistry.ts";
-import { errorResponse } from "@omniroute/open-sse/utils/error.ts";
+import { errorResponse, unavailableResponse } from "@omniroute/open-sse/utils/error.ts";
 import { HTTP_STATUS } from "@omniroute/open-sse/config/constants.ts";
 import * as log from "@/sse/utils/logger";
 import { toJsonErrorPayload } from "@/shared/utils/upstreamError";
@@ -156,8 +156,15 @@ export async function POST(request) {
        `No credentials for image provider: ${provider}`
      );
    }
+    if (credentials.allRateLimited) {
+      return unavailableResponse(
+        HTTP_STATUS.RATE_LIMITED,
+        `[${provider}] All accounts rate limited`,
+        credentials.retryAfter,
+        credentials.retryAfterHuman
+      );
+    }
  } else if (isCustomModel) {
-    // Custom models need credentials from the provider connection
    credentials = await getProviderCredentials(provider);
    if (!credentials) {
      return errorResponse(
@@ -165,6 +172,14 @@ export async function POST(request) {
        `No credentials for custom image provider: ${provider}`
      );
    }
+    if (credentials.allRateLimited) {
+      return unavailableResponse(
+        HTTP_STATUS.RATE_LIMITED,
+        `[${provider}] All accounts rate limited`,
+        credentials.retryAfter,
+        credentials.retryAfterHuman
+      );
+    }
  }

  const result = await handleImageGeneration({
@@ -1,5 +1,5 @@
 import { CORS_ORIGIN } from "@/shared/utils/cors";
-import { errorResponse } from "@omniroute/open-sse/utils/error.ts";
+import { errorResponse, unavailableResponse } from "@omniroute/open-sse/utils/error.ts";
 import { HTTP_STATUS } from "@omniroute/open-sse/config/constants.ts";
 import { getRegistryEntry } from "@omniroute/open-sse/config/providerRegistry.ts";
 import {
@@ -85,6 +85,14 @@ export async function POST(request, { params }) {
  if (!credentials) {
    return errorResponse(HTTP_STATUS.BAD_REQUEST, `No credentials for provider: ${rawProvider}`);
  }
+  if (credentials.allRateLimited) {
+    return unavailableResponse(
+      HTTP_STATUS.RATE_LIMITED,
+      `[${rawProvider}] All accounts rate limited`,
+      credentials.retryAfter,
+      credentials.retryAfterHuman
+    );
+  }

  const result = await handleEmbedding({ body, credentials, log });

@@ -1,6 +1,6 @@
 import { CORS_ORIGIN } from "@/shared/utils/cors";
 import { handleImageGeneration } from "@omniroute/open-sse/handlers/imageGeneration.ts";
-import { errorResponse } from "@omniroute/open-sse/utils/error.ts";
+import { errorResponse, unavailableResponse } from "@omniroute/open-sse/utils/error.ts";
 import { HTTP_STATUS } from "@omniroute/open-sse/config/constants.ts";
 import {
  getProviderCredentials,
@@ -85,6 +85,14 @@ export async function POST(request, { params }) {
      `No credentials for image provider: ${rawProvider}`
    );
  }
+  if (credentials.allRateLimited) {
+    return unavailableResponse(
+      HTTP_STATUS.RATE_LIMITED,
+      `[${rawProvider}] All accounts rate limited`,
+      credentials.retryAfter,
+      credentials.retryAfterHuman
+    );
+  }

  const result = await handleImageGeneration({ body, credentials, log });

@@ -0,0 +1,25 @@
+/**
+ * Cache Control Settings
+ *
+ * Provides cached access to cache control settings for performance.
+ * Settings are fetched once and cached to avoid repeated DB hits.
+ */
+
+import { getSettings } from "./db/settings";
+import type { CacheControlMode } from "@omniroute/open-sse/utils/cacheControlPolicy";
+
+let cachedSettings: CacheControlMode | null = null;
+
+export async function getCacheControlSettings(): Promise<CacheControlMode> {
+  if (cachedSettings !== null) {
+    return cachedSettings;
+  }
+
+  const settings = await getSettings();
+  cachedSettings = (settings.alwaysPreserveClientCache as CacheControlMode) || "auto";
+  return cachedSettings;
+}
+
+export function invalidateCacheControlSettingsCache() {
+  cachedSettings = null;
+}
@@ -46,6 +46,7 @@ export async function getSettings() {
    stickyRoundRobinLimit: 3,
    requireLogin: true,
    hiddenSidebarItems: [],
+    alwaysPreserveClientCache: "auto",
  };
  for (const row of rows) {
    const record = toRecord(row);
@@ -486,3 +487,177 @@ export async function setProxyConfig(config: Record<string, unknown>) {
  backupDbFile("pre-write");
  return current;
 }
+
+// ──────────────── Cache Control Metrics ────────────────
+// Cache metrics are now computed from usage_history table on-the-fly
+// This avoids race conditions and keeps a single source of truth for token data
+
+export async function getCacheMetrics() {
+  const db = getDbInstance();
+
+  try {
+    // Aggregate totals from usage_history
+    const totalsRow = db
+      .prepare(
+        `
+      SELECT
+        COUNT(*) as totalRequests,
+        SUM(tokens_input) as totalInputTokens,
+        SUM(tokens_cache_read) as totalCachedTokens,
+        SUM(tokens_cache_creation) as totalCacheCreationTokens
+      FROM usage_history
+      WHERE tokens_cache_read > 0 OR tokens_cache_creation > 0
+    `
+      )
+      .get() as
+      | {
+          totalRequests: number;
+          totalInputTokens: number | null;
+          totalCachedTokens: number | null;
+          totalCacheCreationTokens: number | null;
+        }
+      | undefined;
+
+    // Get all requests count (including those without cache activity)
+    const allRequestsRow = db
+      .prepare(
+        `
+      SELECT COUNT(*) as totalRequests
+      FROM usage_history
+    `
+      )
+      .get() as { totalRequests: number } | undefined;
+
+    // Aggregate by provider
+    const byProviderRows = db
+      .prepare(
+        `
+      SELECT
+        provider,
+        COUNT(*) as requests,
+        SUM(tokens_input) as inputTokens,
+        SUM(tokens_cache_read) as cachedTokens,
+        SUM(tokens_cache_creation) as cacheCreationTokens
+      FROM usage_history
+      WHERE (tokens_cache_read > 0 OR tokens_cache_creation > 0)
+        AND provider IS NOT NULL
+      GROUP BY provider
+    `
+      )
+      .all() as Array<{
+      provider: string;
+      requests: number;
+      inputTokens: number | null;
+      cachedTokens: number | null;
+      cacheCreationTokens: number | null;
+    }>;
+
+    // Aggregate by strategy
+    // Since combo_strategy isn't tracked in usage_history yet, we use 'direct' for all requests
+    // TODO: Add combo_strategy column to usage_history for proper strategy tracking
+    const byStrategyRows = db
+      .prepare(
+        `
+      SELECT
+        'direct' as strategy,
+        COUNT(*) as requests,
+        SUM(tokens_input) as inputTokens,
+        SUM(tokens_cache_read) as cachedTokens,
+        SUM(tokens_cache_creation) as cacheCreationTokens
+      FROM usage_history
+      WHERE (tokens_cache_read > 0 OR tokens_cache_creation > 0)
+      GROUP BY 'direct'
+    `
+      )
+      .all() as Array<{
+      strategy: string;
+      requests: number;
+      inputTokens: number | null;
+      cachedTokens: number | null;
+      cacheCreationTokens: number | null;
+    }>;
+
+    // Calculate tokens saved (cached tokens are reused, not charged at full price)
+    const tokensSaved = totalsRow?.totalCachedTokens || 0;
+
+    // Build byProvider object
+    const byProvider: Record<
+      string,
+      {
+        requests: number;
+        inputTokens: number;
+        cachedTokens: number;
+        cacheCreationTokens: number;
+      }
+    > = {};
+    for (const row of byProviderRows) {
+      byProvider[row.provider] = {
+        requests: row.requests,
+        inputTokens: row.inputTokens || 0,
+        cachedTokens: row.cachedTokens || 0,
+        cacheCreationTokens: row.cacheCreationTokens || 0,
+      };
+    }
+
+    // Build byStrategy object
+    const byStrategy: Record<
+      string,
+      {
+        requests: number;
+        inputTokens: number;
+        cachedTokens: number;
+        cacheCreationTokens: number;
+      }
+    > = {};
+    for (const row of byStrategyRows) {
+      byStrategy[row.strategy] = {
+        requests: row.requests,
+        inputTokens: row.inputTokens || 0,
+        cachedTokens: row.cachedTokens || 0,
+        cacheCreationTokens: row.cacheCreationTokens || 0,
+      };
+    }
+
+    return {
+      totalRequests: allRequestsRow?.totalRequests || totalsRow?.totalRequests || 0,
+      requestsWithCacheControl: totalsRow?.totalRequests || 0,
+      totalInputTokens: totalsRow?.totalInputTokens || 0,
+      totalCachedTokens: totalsRow?.totalCachedTokens || 0,
+      totalCacheCreationTokens: totalsRow?.totalCacheCreationTokens || 0,
+      tokensSaved,
+      estimatedCostSaved: 0, // Would need pricing data to calculate
+      byProvider,
+      byStrategy,
+      lastUpdated: new Date().toISOString(),
+    };
+  } catch (error) {
+    console.error("Failed to fetch cache metrics from usage_history:", error);
+    return {
+      totalRequests: 0,
+      requestsWithCacheControl: 0,
+      totalInputTokens: 0,
+      totalCachedTokens: 0,
+      totalCacheCreationTokens: 0,
+      tokensSaved: 0,
+      estimatedCostSaved: 0,
+      byProvider: {},
+      byStrategy: {},
+      lastUpdated: new Date().toISOString(),
+    };
+  }
+}
+
+export async function updateCacheMetrics(_metrics: Record<string, unknown>) {
+  // No-op: metrics are now computed from usage_history on-the-fly
+  // The usage_history table is the single source of truth
+  return getCacheMetrics();
+}
+
+export async function resetCacheMetrics() {
+  // No-op: cannot delete historical usage data
+  // Cache metrics are computed from usage_history, so they reflect actual request history
+  console.warn(
+    "resetCacheMetrics is deprecated - cache metrics are now computed from usage_history"
+  );
+  return getCacheMetrics();
+}
@@ -0,0 +1,258 @@
+import { execFile, spawn } from "node:child_process";
+import { closeSync, mkdirSync, openSync } from "node:fs";
+import { access } from "node:fs/promises";
+import path from "node:path";
+import { promisify } from "node:util";
+
+const execFileAsync = promisify(execFile);
+
+type ComposeCommand = "docker compose" | "docker-compose";
+export type AutoUpdateMode = "npm" | "docker-compose";
+
+type ExecFileLike = typeof execFileAsync;
+type SpawnLike = typeof spawn;
+
+export type AutoUpdateConfig = {
+  mode: AutoUpdateMode;
+  repoDir: string;
+  composeFile: string;
+  composeProfile: string;
+  composeService: string;
+  gitRemote: string;
+  patchCommits: string[];
+  logPath: string;
+};
+
+export type AutoUpdateValidation = {
+  supported: boolean;
+  reason: string | null;
+  composeCommand: ComposeCommand | null;
+};
+
+export type AutoUpdateLaunchResult = {
+  started: boolean;
+  channel: AutoUpdateMode;
+  logPath: string;
+  composeCommand: ComposeCommand | null;
+  error?: string;
+};
+
+function normalizeMode(raw: string | undefined): AutoUpdateMode {
+  return raw === "docker-compose" ? "docker-compose" : "npm";
+}
+
+async function pathExists(targetPath: string): Promise<boolean> {
+  try {
+    await access(targetPath);
+    return true;
+  } catch {
+    return false;
+  }
+}
+
+function shellQuote(value: string): string {
+  return `'${value.replace(/'/g, `'"'"'`)}'`;
+}
+
+function parsePatchCommits(raw: string | undefined): string[] {
+  return (raw || "").split(/[\s,]+/).map((value) => value.trim()).filter(Boolean);
+}
+
+export function getAutoUpdateConfig(env: NodeJS.ProcessEnv = process.env): AutoUpdateConfig {
+  const dataDir = env.DATA_DIR || "/tmp/omniroute";
+  const repoDir = env.AUTO_UPDATE_REPO_DIR || "/workspace/omniroute";
+
+  return {
+    mode: normalizeMode(env.AUTO_UPDATE_MODE),
+    repoDir,
+    composeFile: env.AUTO_UPDATE_COMPOSE_FILE || path.join(repoDir, "docker-compose.yml"),
+    composeProfile: env.AUTO_UPDATE_COMPOSE_PROFILE || "cli",
+    composeService: env.AUTO_UPDATE_SERVICE || "omniroute-cli",
+    gitRemote: env.AUTO_UPDATE_GIT_REMOTE || "origin",
+    patchCommits: parsePatchCommits(env.AUTO_UPDATE_PATCH_COMMITS),
+    logPath: env.AUTO_UPDATE_LOG_PATH || path.join(dataDir, "logs", "auto-update.log"),
+  };
+}
+
+export async function detectComposeCommand(
+  execFileImpl: ExecFileLike = execFileAsync
+): Promise<ComposeCommand | null> {
+  try {
+    await execFileImpl("docker", ["compose", "version"], { timeout: 10_000 });
+    return "docker compose";
+  } catch {
+    // Fall through.
+  }
+
+  try {
+    await execFileImpl("docker-compose", ["version"], { timeout: 10_000 });
+    return "docker-compose";
+  } catch {
+    return null;
+  }
+}
+
+export async function validateAutoUpdateRuntime(
+  config: AutoUpdateConfig,
+  execFileImpl: ExecFileLike = execFileAsync,
+  existsImpl: (targetPath: string) => Promise<boolean> = pathExists
+): Promise<AutoUpdateValidation> {
+  if (config.mode !== "docker-compose") {
+    return { supported: true, reason: null, composeCommand: null };
+  }
+
+  if (!(await existsImpl(config.repoDir))) {
+    return {
+      supported: false,
+      reason: `Repository directory not found: ${config.repoDir}`,
+      composeCommand: null,
+    };
+  }
+
+  if (!(await existsImpl(config.composeFile))) {
+    return {
+      supported: false,
+      reason: `Compose file not found: ${config.composeFile}`,
+      composeCommand: null,
+    };
+  }
+
+  if (!(await existsImpl("/var/run/docker.sock"))) {
+    return {
+      supported: false,
+      reason: "Docker socket is not mounted into the OmniRoute container.",
+      composeCommand: null,
+    };
+  }
+
+  try {
+    await execFileImpl("git", ["--version"], { timeout: 10_000 });
+  } catch {
+    return {
+      supported: false,
+      reason: "git is not available inside the OmniRoute container.",
+      composeCommand: null,
+    };
+  }
+
+  const composeCommand = await detectComposeCommand(execFileImpl);
+  if (!composeCommand) {
+    return {
+      supported: false,
+      reason: "Neither docker compose nor docker-compose is available inside the OmniRoute container.",
+      composeCommand: null,
+    };
+  }
+
+  return { supported: true, reason: null, composeCommand };
+}
+
+export function buildNpmUpdateScript(latest: string): string {
+  return [
+    "set -eu",
+    `npm install -g omniroute@${latest} --ignore-scripts`,
+    "if command -v pm2 >/dev/null 2>&1; then",
+    "  pm2 restart omniroute || true",
+    "fi",
+    `echo \"[AutoUpdate] Successfully updated to v${latest}.\"`,
+  ].join("\n");
+}
+
+export function buildDockerComposeUpdateScript({
+  latest,
+  config,
+  composeCommand,
+}: {
+  latest: string;
+  config: AutoUpdateConfig;
+  composeCommand: ComposeCommand;
+}): string {
+  const targetTag = latest.startsWith("v") ? latest : `v${latest}`;
+  const composeInvocation =
+    composeCommand === "docker compose"
+      ? 'docker compose -f "$COMPOSE_FILE" up -d --build "$SERVICE"'
+      : 'docker-compose -f "$COMPOSE_FILE" up -d --build "$SERVICE"';
+  const patchLines = config.patchCommits.length
+    ? [`git cherry-pick --keep-redundant-commits ${config.patchCommits.map(shellQuote).join(' ')}`]
+    : [];
+
+  return [
+    "set -eu",
+    'export PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:$PATH"',
+    `REPO_DIR=${shellQuote(config.repoDir)}`,
+    `COMPOSE_FILE=${shellQuote(config.composeFile)}`,
+    `PROFILE=${shellQuote(config.composeProfile)}`,
+    `SERVICE=${shellQuote(config.composeService)}`,
+    `REMOTE=${shellQuote(config.gitRemote)}`,
+    `TARGET_TAG=${shellQuote(targetTag)}`,
+    'cd "$REPO_DIR"',
+    'git config --global --add safe.directory "$REPO_DIR" >/dev/null 2>&1 || true',
+    'if [ -n "$(git status --porcelain)" ]; then',
+    '  echo "[AutoUpdate] Refusing update: git worktree has local changes." >&2',
+    '  exit 1',
+    'fi',
+    'git fetch --tags "$REMOTE"',
+    'if ! git rev-parse -q --verify "refs/tags/$TARGET_TAG" >/dev/null 2>&1; then',
+    '  echo "[AutoUpdate] Tag $TARGET_TAG not found on remote $REMOTE." >&2',
+    '  exit 1',
+    'fi',
+    'backup_branch="autoupdate/pre-${TARGET_TAG#v}-$(date +%Y%m%d-%H%M%S)"',
+    'git branch "$backup_branch" >/dev/null 2>&1 || true',
+    'git checkout -B "autoupdate/${TARGET_TAG#v}" "$TARGET_TAG"',
+    ...patchLines,
+    'export COMPOSE_PROFILES="$PROFILE"',
+    composeInvocation,
+    `echo "[AutoUpdate] Successfully switched to ${targetTag} via ${composeCommand}."`,
+  ].join("\n");
+}
+
+export async function launchAutoUpdate({
+  latest,
+  env = process.env,
+  execFileImpl = execFileAsync,
+  spawnImpl = spawn,
+}: {
+  latest: string;
+  env?: NodeJS.ProcessEnv;
+  execFileImpl?: ExecFileLike;
+  spawnImpl?: SpawnLike;
+}): Promise<AutoUpdateLaunchResult> {
+  const config = getAutoUpdateConfig(env);
+  const validation = await validateAutoUpdateRuntime(config, execFileImpl);
+
+  if (!validation.supported) {
+    return {
+      started: false,
+      channel: config.mode,
+      logPath: config.logPath,
+      composeCommand: validation.composeCommand,
+      error: validation.reason || "Auto-update runtime is not available.",
+    };
+  }
+
+  const script =
+    config.mode === "docker-compose"
+      ? buildDockerComposeUpdateScript({
+          latest,
+          config,
+          composeCommand: validation.composeCommand || "docker-compose",
+        })
+      : buildNpmUpdateScript(latest);
+
+  mkdirSync(path.dirname(config.logPath), { recursive: true });
+  const logFd = openSync(config.logPath, "a");
+  const child = spawnImpl("sh", ["-lc", script], {
+    detached: true,
+    stdio: ["ignore", logFd, logFd],
+    env: { ...process.env, ...env },
+  });
+  closeSync(logFd);
+  child.unref();
+
+  return {
+    started: true,
+    channel: config.mode,
+    logPath: config.logPath,
+    composeCommand: validation.composeCommand,
+  };
+}
@@ -9,7 +9,7 @@ import { isModelSyncInternalRequest } from "./shared/services/modelSyncScheduler

 const SECRET = new TextEncoder().encode(process.env.JWT_SECRET || "");

-export async function proxy(request) {
+export async function proxy(request: any) {
  const { pathname } = request.nextUrl;

  // Pipeline: Add request ID header for end-to-end tracing
@@ -327,8 +327,15 @@ const getExpectedParentPaths = (): string[] => {

  const npmPrefix = getNpmGlobalPrefix();

+  // Add common user bin directories
+  const userBinPaths = [
+    path.join(home, "bin"),
+    path.join(home, ".local", "bin"),
+  ];
+
  return [
    home,
+    ...userBinPaths,
    userProfile,
    validatedAppData,
    validatedLocalAppData,
@@ -374,7 +381,10 @@ const getKnownToolPaths = (toolId: string): string[] => {
      ["claude.exe", "claude"],
    ],
    codex: [["codex.cmd", "codex"]],
-    droid: [["droid.cmd", "droid"]],
+    droid: [
+      ["droid.cmd", "droid"],
+      ["droid.exe", "droid"],
+    ],
    openclaw: [["openclaw.cmd", "openclaw"]],
    cursor: [
      ["agent.cmd", "agent"],
@@ -404,6 +414,10 @@ const getKnownToolPaths = (toolId: string): string[] => {
      }
    }

+    if (toolId === "droid") {
+      paths.push(path.join(home, "bin", "droid.exe"));
+    }
+
    for (const [winName] of bins) {
      if (npmPrefix) paths.push(path.join(npmPrefix, winName));
      if (appData) {
@@ -89,6 +89,10 @@ export async function verifyAuth(request: any): Promise<string | null> {
 * need to conditionally skip auth should check that separately.
 */
 export async function isAuthenticated(request: Request): Promise<boolean> {
+  // If settings say login/auth is disabled, treat all requests as authenticated
+  if (!(await isAuthRequired())) {
+    return true;
+  }
  // 1. Check API key (for external clients)
  const authHeader = request.headers.get("authorization");
  if (authHeader?.startsWith("Bearer ")) {
@@ -1313,3 +1313,11 @@ export const v1SearchResponseSchema = z.object({
    )
    .optional(),
 });
+
+// ─── Auto-disable banned/error accounts ───────────────────────────────────
+export const updateAutoDisableAccountsSchema = z
+  .object({
+    enabled: z.boolean(),
+    threshold: z.number().int().min(1).max(10).optional(),
+  })
+  .strict();
@@ -47,6 +47,8 @@ export const updateSettingsSchema = z.object({
  cliCompatProviders: z.array(z.string().max(100)).optional(),
  // Strip provider/model prefix at proxy layer (e.g. "openai/gpt-4" → "gpt-4")
  stripModelPrefix: z.boolean().optional(),
+  // Cache control preservation mode
+  alwaysPreserveClientCache: z.enum(["auto", "always", "never"]).optional(),
  // Custom CLI agent definitions for ACP
  customAgents: z
    .array(
@@ -144,8 +144,8 @@ export async function handleChat(request: any, clientRawRequest: any = null) {
  }

  // Optional strict API key mode for /v1 endpoints (require key on every request).
-  const isInternalTest = request.headers?.get?.("x-internal-test") === "combo-health-check";
-  if (process.env.REQUIRE_API_KEY === "true" && !isInternalTest) {
+  const isComboLiveTest = request.headers?.get?.("x-internal-test") === "combo-health-check";
+  if (process.env.REQUIRE_API_KEY === "true" && !isComboLiveTest) {
    if (!apiKey) {
      log.warn("AUTH", "Missing API key while REQUIRE_API_KEY=true");
      return errorResponse(HTTP_STATUS.UNAUTHORIZED, "Missing API key");
@@ -155,7 +155,7 @@ export async function handleChat(request: any, clientRawRequest: any = null) {
      log.warn("AUTH", "Invalid API key while REQUIRE_API_KEY=true");
      return errorResponse(HTTP_STATUS.UNAUTHORIZED, "Invalid API key");
    }
-  } else if (apiKey && !isInternalTest) {
+  } else if (apiKey && !isComboLiveTest) {
    // Client sent a Bearer key — it must exist in DB (otherwise reject to avoid "key ignored" confusion).
    const valid = await isValidApiKey(apiKey);
    if (!valid) {
@@ -238,9 +238,11 @@ export async function handleChat(request: any, clientRawRequest: any = null) {
      `Combo "${modelStr}" [${combo.strategy || "priority"}] with ${combo.models.length} models`
    );

-    // Pre-check function: skip models where all accounts are in cooldown
-    // Uses modelAvailability module for TTL-based cooldowns
+    // Pre-check function used by combo routing. For explicit combo live tests,
+    // avoid pre-skipping so each model gets a real execution attempt.
    const checkModelAvailable = async (modelString: string) => {
+      if (isComboLiveTest) return true;
+
      // Use getModelInfo to properly resolve custom prefixes
      const modelInfo = await getModelInfo(modelString);
      const provider = modelInfo.provider;
@@ -273,9 +275,21 @@ export async function handleChat(request: any, clientRawRequest: any = null) {
      body,
      combo,
      handleSingleModel: (b: any, m: string) =>
-        handleSingleModelChat(b, m, clientRawRequest, request, combo.name, apiKeyInfo, telemetry, {
-          sessionId,
-        }),
+        handleSingleModelChat(
+          b,
+          m,
+          clientRawRequest,
+          request,
+          combo.name,
+          apiKeyInfo,
+          telemetry,
+          {
+            sessionId,
+            forceLiveComboTest: isComboLiveTest,
+          },
+          combo.strategy,
+          true
+        ),
      isModelAvailable: checkModelAvailable,
      log,
      settings,
@@ -304,7 +318,9 @@ export async function handleChat(request: any, clientRawRequest: any = null) {
          combo.name,
          apiKeyInfo,
          telemetry,
-          { sessionId, emergencyFallbackTried: true }
+          { sessionId, emergencyFallbackTried: true, forceLiveComboTest: isComboLiveTest },
+          combo.strategy,
+          true
        );
        if (fallbackResponse.ok) {
          log.info("GLOBAL_FALLBACK", `Global fallback ${fallbackModel} succeeded`);
@@ -336,7 +352,9 @@ export async function handleChat(request: any, clientRawRequest: any = null) {
    null,
    apiKeyInfo,
    telemetry,
-    { sessionId }
+    { sessionId, forceLiveComboTest: isComboLiveTest },
+    null,
+    false
  );
  recordTelemetry(telemetry);
  return withSessionHeader(response, sessionId);
@@ -366,16 +384,26 @@ async function handleSingleModelChat(
  comboName: string | null = null,
  apiKeyInfo: any = null,
  telemetry: any = null,
-  runtimeOptions: { emergencyFallbackTried?: boolean; sessionId?: string | null } = {}
+  runtimeOptions: {
+    emergencyFallbackTried?: boolean;
+    forceLiveComboTest?: boolean;
+    sessionId?: string | null;
+  } = {},
+  comboStrategy: string | null = null,
+  isCombo: boolean = false
 ) {
  // 1. Resolve model → provider/model
  const resolved = await resolveModelOrError(modelStr, body, clientRawRequest?.endpoint);
  if (resolved.error) return resolved.error;

  const { provider, model, sourceFormat, targetFormat, extendedContext } = resolved;
+  const forceLiveComboTest = runtimeOptions.forceLiveComboTest === true;

  // 2. Pipeline gates (availability + circuit breaker)
-  const gate = checkPipelineGates(provider, model);
+  const gate = checkPipelineGates(provider, model, {
+    ignoreCircuitBreaker: forceLiveComboTest,
+    ignoreModelCooldown: forceLiveComboTest,
+  });
  if (gate) return gate;

  const breaker = getCircuitBreaker(provider, {
@@ -397,7 +425,13 @@ async function handleSingleModelChat(
      provider,
      excludeConnectionId,
      apiKeyInfo?.allowedConnections ?? null,
-      model
+      model,
+      forceLiveComboTest
+        ? {
+            allowSuppressedConnections: true,
+            bypassQuotaPolicy: true,
+          }
+        : undefined
    );

    if (!credentials || credentials.allRateLimited) {
@@ -431,6 +465,7 @@ async function handleSingleModelChat(
    // 4. Execute chat via core (with circuit breaker + optional TLS)
    if (telemetry) telemetry.startPhase("connect");
    const { result, tlsFingerprintUsed } = await executeChatWithBreaker({
+      bypassCircuitBreaker: forceLiveComboTest,
      breaker,
      body,
      provider,
@@ -443,6 +478,8 @@ async function handleSingleModelChat(
      apiKeyInfo,
      userAgent,
      comboName,
+      comboStrategy,
+      isCombo,
      extendedContext,
    });
    if (telemetry) telemetry.endPhase();
@@ -512,7 +549,9 @@ async function handleSingleModelChat(
            comboName,
            apiKeyInfo,
            telemetry,
-            { ...runtimeOptions, emergencyFallbackTried: true }
+            { ...runtimeOptions, emergencyFallbackTried: true },
+            null, // no strategy for emergency fallback
+            Boolean(comboName) // isCombo if comboName exists
          );

          if (fallbackResponse.ok) {
@@ -602,8 +641,15 @@ async function resolveModelOrError(modelStr: string, body: any, endpointPath: st
 * Check pipeline gates: model availability + circuit breaker state.
 * Returns an error Response if blocked, or null if OK to proceed.
 */
-function checkPipelineGates(provider: string, model: string) {
-  if (!isModelAvailable(provider, model)) {
+function checkPipelineGates(
+  provider: string,
+  model: string,
+  options: { ignoreCircuitBreaker?: boolean; ignoreModelCooldown?: boolean } = {}
+) {
+  const modelAvailable = isModelAvailable(provider, model);
+  if (!modelAvailable && options.ignoreModelCooldown) {
+    log.info("AVAILABILITY", `${provider}/${model} cooldown bypassed for combo live test`);
+  } else if (!modelAvailable) {
    log.warn("AVAILABILITY", `${provider}/${model} is in cooldown, rejecting request`);
    return (unavailableResponse as any)(
      HTTP_STATUS.SERVICE_UNAVAILABLE,
@@ -618,7 +664,9 @@ function checkPipelineGates(provider: string, model: string) {
    onStateChange: (name: string, from: string, to: string) =>
      log.info("CIRCUIT", `${name}: ${from} → ${to}`),
  });
-  if (!breaker.canExecute()) {
+  if (options.ignoreCircuitBreaker && !breaker.canExecute()) {
+    log.info("CIRCUIT", `Bypassing OPEN circuit breaker for combo live test: ${provider}`);
+  } else if (!breaker.canExecute()) {
    log.warn("CIRCUIT", `Circuit breaker OPEN for ${provider}, rejecting request`);
    return (unavailableResponse as any)(
      HTTP_STATUS.SERVICE_UNAVAILABLE,
@@ -636,6 +684,7 @@ function checkPipelineGates(provider: string, model: string) {
 * Execute chat core wrapped in circuit breaker + optional TLS tracking.
 */
 async function executeChatWithBreaker({
+  bypassCircuitBreaker,
  breaker,
  body,
  provider,
@@ -648,6 +697,8 @@ async function executeChatWithBreaker({
  apiKeyInfo,
  userAgent,
  comboName,
+  comboStrategy,
+  isCombo,
  extendedContext,
 }: any): Promise<{ result: any; tlsFingerprintUsed: boolean }> {
  let tlsFingerprintUsed = false;
@@ -665,6 +716,8 @@ async function executeChatWithBreaker({
          apiKeyInfo,
          userAgent,
          comboName,
+          comboStrategy,
+          isCombo,
          onCredentialsRefreshed: async (newCreds: any) => {
            await updateProviderCredentials(credentials.connectionId, {
              accessToken: newCreds.accessToken,
@@ -679,6 +732,16 @@ async function executeChatWithBreaker({
        })
      );

+    if (bypassCircuitBreaker) {
+      if (!proxyInfo?.proxy && isTlsFingerprintActive()) {
+        const tracked = await runWithTlsTracking(chatFn);
+        return { result: tracked.result, tlsFingerprintUsed: tracked.tlsFingerprintUsed };
+      }
+
+      const result = await chatFn();
+      return { result, tlsFingerprintUsed: false };
+    }
+
    if (!proxyInfo?.proxy && isTlsFingerprintActive()) {
      const tracked = await breaker.execute(async () => runWithTlsTracking(chatFn));
      return { result: tracked.result, tlsFingerprintUsed: tracked.tlsFingerprintUsed };
@@ -3,6 +3,7 @@ import {
  validateApiKey,
  updateProviderConnection,
  getSettings,
+  getCachedSettings,
 } from "@/lib/localDb";
 import { getQuotaWindowStatus, isAccountQuotaExhausted } from "@/domain/quotaCache";
 import {
@@ -54,6 +55,11 @@ interface RecoverableConnectionState {
  lastErrorSource?: string | null;
 }

+interface CredentialSelectionOptions {
+  allowSuppressedConnections?: boolean;
+  bypassQuotaPolicy?: boolean;
+}
+
 const CODEX_QUOTA_THRESHOLD_PERCENT = 90;
 const MIN_QUOTA_THRESHOLD_PERCENT = 1;
 const MAX_QUOTA_THRESHOLD_PERCENT = 100;
@@ -311,7 +317,8 @@ export async function getProviderCredentials(
  provider: string,
  excludeConnectionId: string | null = null,
  allowedConnections: string[] | null = null,
-  requestedModel: string | null = null
+  requestedModel: string | null = null,
+  options: CredentialSelectionOptions = {}
 ) {
  // Acquire mutex to prevent race conditions
  const currentMutex = selectionMutex;
@@ -323,6 +330,9 @@ export async function getProviderCredentials(
  try {
    await currentMutex;

+    const allowSuppressedConnections = options.allowSuppressedConnections === true;
+    const bypassQuotaPolicy = options.bypassQuotaPolicy === true;
+
    const connectionsRaw = await getProviderConnections({ provider, isActive: true });
    let connections = (Array.isArray(connectionsRaw) ? connectionsRaw : [])
      .map(toProviderConnection)
@@ -394,9 +404,11 @@ export async function getProviderCredentials(
    // Filter out unavailable accounts and excluded connection
    const availableConnections = connections.filter((c) => {
      if (excludeConnectionId && c.id === excludeConnectionId) return false;
-      if (isAccountUnavailable(c.rateLimitedUntil)) return false;
-      if (isTerminalConnectionStatus(c)) return false;
-      if (provider === "codex" && isCodexScopeUnavailable(c, requestedModel)) return false;
+      if (!allowSuppressedConnections) {
+        if (isAccountUnavailable(c.rateLimitedUntil)) return false;
+        if (isTerminalConnectionStatus(c)) return false;
+        if (provider === "codex" && isCodexScopeUnavailable(c, requestedModel)) return false;
+      }
      return true;
    });

@@ -412,13 +424,23 @@ export async function getProviderCredentials(
      if (excluded || rateLimited) {
        log.debug(
          "AUTH",
-          `  → ${c.id?.slice(0, 8)} | ${excluded ? "excluded" : ""} ${rateLimited ? `rateLimited until ${c.rateLimitedUntil}` : ""}`
+          `  → ${c.id?.slice(0, 8)} | ${excluded ? "excluded" : ""} ${rateLimited ? `rateLimited until ${c.rateLimitedUntil}` : ""}${allowSuppressedConnections && rateLimited ? " (retained for combo live test)" : ""}`
        );
      } else if (terminalStatus) {
-        log.debug("AUTH", `  → ${c.id?.slice(0, 8)} | skipped terminal status=${c.testStatus}`);
+        log.debug(
+          "AUTH",
+          allowSuppressedConnections
+            ? `  → ${c.id?.slice(0, 8)} | retained terminal status=${c.testStatus} for combo live test`
+            : `  → ${c.id?.slice(0, 8)} | skipped terminal status=${c.testStatus}`
+        );
      } else if (codexScopeLimited) {
        const scopeUntil = getCodexScopeRateLimitedUntil(c.providerSpecificData, requestedModel);
-        log.debug("AUTH", `  → ${c.id?.slice(0, 8)} | codex scope-limited until ${scopeUntil}`);
+        log.debug(
+          "AUTH",
+          allowSuppressedConnections
+            ? `  → ${c.id?.slice(0, 8)} | retained codex scope-limited account until ${scopeUntil} for combo live test`
+            : `  → ${c.id?.slice(0, 8)} | codex scope-limited until ${scopeUntil}`
+        );
      }
    });

@@ -461,17 +483,21 @@ export async function getProviderCredentials(
      resetAt: string | null;
    }> = [];

-    policyEligibleConnections = availableConnections.filter((connection) => {
-      const evaluation = evaluateQuotaLimitPolicy(provider, connection);
-      if (!evaluation.blocked) return true;
+    if (!bypassQuotaPolicy) {
+      policyEligibleConnections = availableConnections.filter((connection) => {
+        const evaluation = evaluateQuotaLimitPolicy(provider, connection);
+        if (!evaluation.blocked) return true;

-      blockedByPolicy.push({
-        id: connection.id,
-        reasons: evaluation.reasons,
-        resetAt: evaluation.resetAt,
+        blockedByPolicy.push({
+          id: connection.id,
+          reasons: evaluation.reasons,
+          resetAt: evaluation.resetAt,
+        });
+        return false;
      });
-      return false;
-    });
+    } else if (availableConnections.length > 0) {
+      log.debug("AUTH", `${provider} | bypassing quota policy for combo live test`);
+    }

    if (blockedByPolicy.length > 0) {
      log.info(
@@ -748,13 +774,14 @@ export async function markAccountUnavailable(
      }
    }

-    const { shouldFallback, cooldownMs, newBackoffLevel, reason } = checkFallbackError(
+    const result = checkFallbackError(
      status,
      errorText,
      backoffLevel,
      model,
      provider // ← Now passes provider for profile-aware cooldowns
    );
+    const { shouldFallback, cooldownMs, newBackoffLevel, reason } = result;
    if (!shouldFallback) return { shouldFallback: false, cooldownMs: 0 };

    // ── Local provider 404: model-only lockout, connection stays active ──
@@ -820,6 +847,28 @@ export async function markAccountUnavailable(
      backoffLevel: newBackoffLevel ?? backoffLevel,
    });

+    // T-AUTODISABLE: If auto-disable setting is enabled and error is permanent/terminal,
+    // mark account as inactive so it is never retried again.
+    // Uses getCachedSettings() to avoid DB overhead on hot error path.
+    // NOTE: For permanent bans we disable immediately — no threshold needed,
+    // because a permanent ban (403 "Verify your account" / ToS violation) will
+    // NEVER recover, so retrying is pointless regardless of attempt count.
+    if (result.permanent) {
+      try {
+        const settings = await getCachedSettings();
+        const autoDisableEnabled = settings.autoDisableBannedAccounts ?? false;
+        if (autoDisableEnabled) {
+          await updateProviderConnection(connectionId, { isActive: false });
+          log.info(
+            "AUTH",
+            `Auto-disabled ${connectionId.slice(0, 8)} — permanent ban detected (autoDisableBannedAccounts=true)`
+          );
+        }
+      } catch (e) {
+        log.info("AUTH", `Auto-disable check failed (non-fatal): ${e}`);
+      }
+    }
+
    // Per-model lockout: lock the specific model if known
    if (provider && model && cooldownMs > 0) {
      lockModel(provider, connectionId, model, reason || "unknown", cooldownMs);
@@ -0,0 +1,207 @@
+[CREDENTIALS] No external credentials file found, using defaults.
+[DB] SQLite database ready: /home/diegosouzapw/.omniroute/storage.sqlite
+[MODEL] Ambiguous model 'claude-haiku-4.5'. Use provider/model prefix (ex: gh/claude-haiku-4.5 or kr/claude-haiku-4.5). Candidates: gh, kr, anthropic
+TAP version 13
+# Subtest: getModelInfoCore resolves unique non-openai unprefixed model
+ok 1 - getModelInfoCore resolves unique non-openai unprefixed model
+  ---
+  duration_ms: 3.403766
+  type: 'test'
+  ...
+# Subtest: getModelInfoCore keeps openai fallback for gpt-4o
+ok 2 - getModelInfoCore keeps openai fallback for gpt-4o
+  ---
+  duration_ms: 0.535726
+  type: 'test'
+  ...
+# Subtest: getModelInfoCore resolves gpt-5.4 to codex
+ok 3 - getModelInfoCore resolves gpt-5.4 to codex
+  ---
+  duration_ms: 0.321781
+  type: 'test'
+  ...
+# Subtest: getModelInfoCore returns explicit ambiguity metadata for ambiguous unprefixed model
+ok 4 - getModelInfoCore returns explicit ambiguity metadata for ambiguous unprefixed model
+  ---
+  duration_ms: 1.079896
+  type: 'test'
+  ...
+# Subtest: getModelInfoCore canonicalizes github legacy alias with explicit provider prefix
+ok 5 - getModelInfoCore canonicalizes github legacy alias with explicit provider prefix
+  ---
+  duration_ms: 0.370547
+  type: 'test'
+  ...
+# Subtest: GithubExecutor routes codex-family model to /responses
+ok 6 - GithubExecutor routes codex-family model to /responses
+  ---
+  duration_ms: 0.47113
+  type: 'test'
+  ...
+# Subtest: GithubExecutor keeps non-codex model on /chat/completions
+ok 7 - GithubExecutor keeps non-codex model on /chat/completions
+  ---
+  duration_ms: 0.38457
+  type: 'test'
+  ...
+# Subtest: DefaultExecutor uses x-api-key for kimi-coding-apikey
+ok 8 - DefaultExecutor uses x-api-key for kimi-coding-apikey
+  ---
+  duration_ms: 0.451443
+  type: 'test'
+  ...
+# Subtest: CodexExecutor forces stream=true for upstream compatibility
+ok 9 - CodexExecutor forces stream=true for upstream compatibility
+  ---
+  duration_ms: 1.203259
+  type: 'test'
+  ...
+# Subtest: Claude native messages can be round-tripped through OpenAI into Claude OAuth format
+ok 10 - Claude native messages can be round-tripped through OpenAI into Claude OAuth format
+  ---
+  duration_ms: 7.232512
+  type: 'test'
+  ...
+# Subtest: CodexExecutor maps fast service tier to priority
+ok 11 - CodexExecutor maps fast service tier to priority
+  ---
+  duration_ms: 0.489993
+  type: 'test'
+  ...
+# Subtest: shouldUseNativeCodexPassthrough only enables responses-native Codex requests
+ok 12 - shouldUseNativeCodexPassthrough only enables responses-native Codex requests
+  ---
+  duration_ms: 0.441911
+  type: 'test'
+  ...
+# Subtest: CodexExecutor can force fast service tier from settings
+ok 13 - CodexExecutor can force fast service tier from settings
+  ---
+  duration_ms: 0.299575
+  type: 'test'
+  ...
+# Subtest: CodexExecutor always requests SSE accept header
+ok 14 - CodexExecutor always requests SSE accept header
+  ---
+  duration_ms: 0.602914
+  type: 'test'
+  ...
+# Subtest: CodexExecutor does not request SSE accept header for compact requests
+ok 15 - CodexExecutor does not request SSE accept header for compact requests
+  ---
+  duration_ms: 0.322611
+  type: 'test'
+  ...
+# Subtest: CodexExecutor preserves native responses payloads for Codex passthrough
+not ok 16 - CodexExecutor preserves native responses payloads for Codex passthrough
+  ---
+  duration_ms: 1.856261
+  type: 'test'
+  location: '/home/diegosouzapw/dev/proxys/9router/tests/unit/plan3-p0.test.mjs:221:1'
+  failureType: 'testCodeFailure'
+  error: |-
+    Expected values to be strictly equal:
+    
+    false !== true
+    
+  code: 'ERR_ASSERTION'
+  name: 'AssertionError'
+  expected: true
+  actual: false
+  operator: 'strictEqual'
+  stack: |-
+    TestContext.<anonymous> (file:///home/diegosouzapw/dev/proxys/9router/tests/unit/plan3-p0.test.mjs:242:10)
+    Test.runInAsyncScope (node:async_hooks:214:14)
+    Test.run (node:internal/test_runner/test:1047:25)
+    Test.processPendingSubtests (node:internal/test_runner/test:744:18)
+    Test.postRun (node:internal/test_runner/test:1173:19)
+    Test.run (node:internal/test_runner/test:1101:12)
+    async Test.processPendingSubtests (node:internal/test_runner/test:744:7)
+  ...
+# Subtest: CodexExecutor strips streaming fields for compact passthrough
+ok 17 - CodexExecutor strips streaming fields for compact passthrough
+  ---
+  duration_ms: 0.296176
+  type: 'test'
+  ...
+# Subtest: CodexExecutor routes responses subpaths to matching upstream paths
+ok 18 - CodexExecutor routes responses subpaths to matching upstream paths
+  ---
+  duration_ms: 0.546657
+  type: 'test'
+  ...
+# Subtest: translateNonStreamingResponse converts Responses API payload to OpenAI chat.completion
+ok 19 - translateNonStreamingResponse converts Responses API payload to OpenAI chat.completion
+  ---
+  duration_ms: 1.483788
+  type: 'test'
+  ...
+# Subtest: extractUsageFromResponse reads usage from Responses API payload
+ok 20 - extractUsageFromResponse reads usage from Responses API payload
+  ---
+  duration_ms: 0.398039
+  type: 'test'
+  ...
+# Subtest: detectFormat identifies OpenAI Responses when input is string
+ok 21 - detectFormat identifies OpenAI Responses when input is string
+  ---
+  duration_ms: 0.359174
+  type: 'test'
+  ...
+# Subtest: detectFormat identifies OpenAI Responses by max_output_tokens without input array
+ok 22 - detectFormat identifies OpenAI Responses by max_output_tokens without input array
+  ---
+  duration_ms: 0.271215
+  type: 'test'
+  ...
+# Subtest: detectFormatFromEndpoint forces OpenAI for /v1/chat/completions
+ok 23 - detectFormatFromEndpoint forces OpenAI for /v1/chat/completions
+  ---
+  duration_ms: 0.52054
+  type: 'test'
+  ...
+# Subtest: detectFormatFromEndpoint forces Claude for /v1/messages
+ok 24 - detectFormatFromEndpoint forces Claude for /v1/messages
+  ---
+  duration_ms: 0.433035
+  type: 'test'
+  ...
+# Subtest: translateRequest normalizes openai-responses input string into list payload
+ok 25 - translateRequest normalizes openai-responses input string into list payload
+  ---
+  duration_ms: 0.358109
+  type: 'test'
+  ...
+# Subtest: translateRequest preserves service_tier when converting openai to openai-responses
+ok 26 - translateRequest preserves service_tier when converting openai to openai-responses
+  ---
+  duration_ms: 1.10454
+  type: 'test'
+  ...
+# Subtest: parseSSEToResponsesOutput parses completed response from SSE payload
+ok 27 - parseSSEToResponsesOutput parses completed response from SSE payload
+  ---
+  duration_ms: 0.575476
+  type: 'test'
+  ...
+# Subtest: parseSSEToResponsesOutput returns null for invalid payload
+ok 28 - parseSSEToResponsesOutput returns null for invalid payload
+  ---
+  duration_ms: 0.302714
+  type: 'test'
+  ...
+# Subtest: parseSSEToOpenAIResponse merges split tool call chunks by id without duplication
+ok 29 - parseSSEToOpenAIResponse merges split tool call chunks by id without duplication
+  ---
+  duration_ms: 0.916032
+  type: 'test'
+  ...
+1..29
+# tests 29
+# suites 0
+# pass 28
+# fail 1
+# cancelled 0
+# skipped 0
+# todo 0
+# duration_ms 65.394285
@@ -120,7 +120,11 @@ test("isAuthenticated accepts bearer API keys", async () => {
  assert.equal(result, true);
 });

-test("isAuthenticated returns false without valid credentials", async () => {
+test("isAuthenticated returns false when auth is required without valid credentials", async () => {
+  // Force requireLogin to be active
+  process.env.INITIAL_PASSWORD = "bootstrap-password";
+  await localDb.updateSettings({ requireLogin: true, password: "" });
+
  const request = new Request("https://example.com/api/providers");

  const result = await apiAuth.isAuthenticated(request);
@@ -62,6 +62,27 @@ test("getProviderCredentials returns null when all active connections are termin
  assert.equal(selected, null);
 });

+test("getProviderCredentials can reuse a locally suppressed connection for combo live tests", async () => {
+  await resetStorage();
+
+  const conn = await providersDb.createProviderConnection({
+    provider: "openai",
+    authType: "apikey",
+    apiKey: "sk-live-test",
+    isActive: true,
+    testStatus: "credits_exhausted",
+    rateLimitedUntil: new Date(Date.now() + 60_000).toISOString(),
+  });
+
+  const selected = await auth.getProviderCredentials("openai", null, null, null, {
+    allowSuppressedConnections: true,
+    bypassQuotaPolicy: true,
+  });
+
+  assert.ok(selected);
+  assert.equal(selected.connectionId, conn.id);
+});
+
 test("markAccountUnavailable does not overwrite terminal status", async () => {
  await resetStorage();

@@ -0,0 +1,104 @@
+import { describe, it } from "node:test";
+import assert from "node:assert/strict";
+
+const autoUpdate = await import("../../src/lib/system/autoUpdate.ts");
+
+describe("getAutoUpdateConfig", () => {
+  it("defaults to npm mode", () => {
+    const config = autoUpdate.getAutoUpdateConfig({ DATA_DIR: "/tmp/omniroute" });
+    assert.equal(config.mode, "npm");
+    assert.equal(config.repoDir, "/workspace/omniroute");
+    assert.equal(config.composeProfile, "cli");
+  });
+
+  it("reads docker-compose settings from env", () => {
+    const config = autoUpdate.getAutoUpdateConfig({
+      DATA_DIR: "/tmp/custom-data",
+      AUTO_UPDATE_MODE: "docker-compose",
+      AUTO_UPDATE_REPO_DIR: "/srv/omniroute",
+      AUTO_UPDATE_COMPOSE_FILE: "/srv/omniroute/docker-compose.yml",
+      AUTO_UPDATE_COMPOSE_PROFILE: "base",
+      AUTO_UPDATE_SERVICE: "omniroute-base",
+      AUTO_UPDATE_GIT_REMOTE: "upstream",
+      AUTO_UPDATE_PATCH_COMMITS: "abc123 def456,ghi789",
+      AUTO_UPDATE_LOG_PATH: "/tmp/update.log",
+    });
+
+    assert.equal(config.mode, "docker-compose");
+    assert.equal(config.repoDir, "/srv/omniroute");
+    assert.equal(config.composeFile, "/srv/omniroute/docker-compose.yml");
+    assert.equal(config.composeProfile, "base");
+    assert.equal(config.composeService, "omniroute-base");
+    assert.equal(config.gitRemote, "upstream");
+    assert.deepEqual(config.patchCommits, ["abc123", "def456", "ghi789"]);
+    assert.equal(config.logPath, "/tmp/update.log");
+  });
+});
+
+describe("validateAutoUpdateRuntime", () => {
+  it("reports missing docker socket for docker-compose mode", async () => {
+    const config = autoUpdate.getAutoUpdateConfig({
+      AUTO_UPDATE_MODE: "docker-compose",
+      AUTO_UPDATE_REPO_DIR: "/repo",
+      AUTO_UPDATE_COMPOSE_FILE: "/repo/docker-compose.yml",
+    });
+
+    const result = await autoUpdate.validateAutoUpdateRuntime(
+      config,
+      async () => ({ stdout: "git version 2.0.0", stderr: "" }),
+      async (targetPath) => targetPath !== "/var/run/docker.sock"
+    );
+
+    assert.equal(result.supported, false);
+    assert.match(result.reason, /Docker socket/);
+  });
+
+  it("detects docker-compose command availability", async () => {
+    const config = autoUpdate.getAutoUpdateConfig({
+      AUTO_UPDATE_MODE: "docker-compose",
+      AUTO_UPDATE_REPO_DIR: "/repo",
+      AUTO_UPDATE_COMPOSE_FILE: "/repo/docker-compose.yml",
+    });
+
+    const result = await autoUpdate.validateAutoUpdateRuntime(
+      config,
+      async (file, args) => {
+        if (file === "git") return { stdout: "git version 2.0.0", stderr: "" };
+        if (file === "docker" && args?.[0] === "compose") {
+          return { stdout: "Docker Compose version v2.0.0", stderr: "" };
+        }
+        throw new Error(`unexpected command: ${file}`);
+      },
+      async () => true
+    );
+
+    assert.equal(result.supported, true);
+    assert.equal(result.composeCommand, "docker compose");
+  });
+});
+
+describe("buildDockerComposeUpdateScript", () => {
+  it("includes git checkout and compose rebuild steps", () => {
+    const config = autoUpdate.getAutoUpdateConfig({
+      AUTO_UPDATE_MODE: "docker-compose",
+      AUTO_UPDATE_REPO_DIR: "/repo",
+      AUTO_UPDATE_COMPOSE_FILE: "/repo/docker-compose.yml",
+      AUTO_UPDATE_COMPOSE_PROFILE: "cli",
+      AUTO_UPDATE_SERVICE: "omniroute-cli",
+      AUTO_UPDATE_GIT_REMOTE: "origin",
+      AUTO_UPDATE_PATCH_COMMITS: "1501a87 e569e1c",
+    });
+
+    const script = autoUpdate.buildDockerComposeUpdateScript({
+      latest: "3.2.6",
+      config,
+      composeCommand: "docker compose",
+    });
+
+    assert.match(script, /git fetch --tags/);
+    assert.match(script, /git config --global --add safe\.directory/);
+    assert.match(script, /git checkout -B "autoupdate\/\$\{TARGET_TAG#v\}" "\$TARGET_TAG"/);
+    assert.match(script, /git cherry-pick --keep-redundant-commits '1501a87' 'e569e1c'/);
+    assert.match(script, /docker compose -f "\$COMPOSE_FILE" up -d --build "\$SERVICE"/);
+  });
+});
@@ -0,0 +1,598 @@
+import { describe, test } from "node:test";
+import assert from "node:assert/strict";
+import {
+  isClaudeCodeClient,
+  providerSupportsCaching,
+  isDeterministicStrategy,
+  shouldPreserveCacheControl,
+  trackCacheMetrics,
+  updateCacheTokenMetrics,
+} from "../../open-sse/utils/cacheControlPolicy.ts";
+
+describe("Cache Control Policy", () => {
+  describe("isClaudeCodeClient", () => {
+    test("detects claude-code user agent", () => {
+      assert.equal(isClaudeCodeClient("claude-code/0.1.0"), true);
+      assert.equal(isClaudeCodeClient("claude_code/0.1.0"), true);
+      assert.equal(isClaudeCodeClient("Anthropic CLI/1.0"), true);
+    });
+
+    test("rejects non-Claude clients", () => {
+      assert.equal(isClaudeCodeClient("curl/7.68.0"), false);
+      assert.equal(isClaudeCodeClient("OpenAI/1.0"), false);
+      assert.equal(isClaudeCodeClient(null), false);
+      assert.equal(isClaudeCodeClient(undefined), false);
+      assert.equal(isClaudeCodeClient(""), false);
+    });
+
+    test("is case-insensitive", () => {
+      assert.equal(isClaudeCodeClient("Claude-Code/0.1.0"), true);
+      assert.equal(isClaudeCodeClient("CLAUDE-CODE/0.1.0"), true);
+    });
+  });
+
+  describe("providerSupportsCaching", () => {
+    test("detects caching providers", () => {
+      assert.equal(providerSupportsCaching("claude"), true);
+      assert.equal(providerSupportsCaching("anthropic"), true);
+      assert.equal(providerSupportsCaching("zai"), true);
+      assert.equal(providerSupportsCaching("qwen"), true);
+    });
+
+    test("rejects non-caching providers", () => {
+      assert.equal(providerSupportsCaching("openai"), false);
+      assert.equal(providerSupportsCaching("gemini"), false);
+      assert.equal(providerSupportsCaching("unknown"), false);
+      assert.equal(providerSupportsCaching(null), false);
+      assert.equal(providerSupportsCaching(undefined), false);
+    });
+
+    test("is case-insensitive", () => {
+      assert.equal(providerSupportsCaching("Claude"), true);
+      assert.equal(providerSupportsCaching("ANTHROPIC"), true);
+    });
+  });
+
+  describe("isDeterministicStrategy", () => {
+    test("identifies deterministic strategies", () => {
+      assert.equal(isDeterministicStrategy("priority"), true);
+      assert.equal(isDeterministicStrategy("cost-optimized"), true);
+    });
+
+    test("identifies non-deterministic strategies", () => {
+      assert.equal(isDeterministicStrategy("weighted"), false);
+      assert.equal(isDeterministicStrategy("round-robin"), false);
+      assert.equal(isDeterministicStrategy("random"), false);
+      assert.equal(isDeterministicStrategy("fill-first"), false);
+      assert.equal(isDeterministicStrategy("p2c"), false);
+      assert.equal(isDeterministicStrategy("least-used"), false);
+      assert.equal(isDeterministicStrategy("strict-random"), false);
+    });
+
+    test("handles null/undefined", () => {
+      assert.equal(isDeterministicStrategy(null), false);
+      assert.equal(isDeterministicStrategy(undefined), false);
+    });
+  });
+
+  describe("shouldPreserveCacheControl", () => {
+    test("preserves for single model + Claude client + caching provider", () => {
+      assert.equal(
+        shouldPreserveCacheControl({
+          userAgent: "claude-code/0.1.0",
+          isCombo: false,
+          targetProvider: "claude",
+        }),
+        true
+      );
+    });
+
+    test("preserves for combo with priority strategy + Claude client + caching provider", () => {
+      assert.equal(
+        shouldPreserveCacheControl({
+          userAgent: "claude-code/0.1.0",
+          isCombo: true,
+          comboStrategy: "priority",
+          targetProvider: "claude",
+        }),
+        true
+      );
+    });
+
+    test("preserves for combo with cost-optimized strategy + Claude client + caching provider", () => {
+      assert.equal(
+        shouldPreserveCacheControl({
+          userAgent: "claude-code/0.1.0",
+          isCombo: true,
+          comboStrategy: "cost-optimized",
+          targetProvider: "anthropic",
+        }),
+        true
+      );
+    });
+
+    test("rejects non-Claude clients", () => {
+      assert.equal(
+        shouldPreserveCacheControl({
+          userAgent: "curl/7.68.0",
+          isCombo: false,
+          targetProvider: "claude",
+        }),
+        false
+      );
+    });
+
+    test("rejects non-caching providers", () => {
+      assert.equal(
+        shouldPreserveCacheControl({
+          userAgent: "claude-code/0.1.0",
+          isCombo: false,
+          targetProvider: "openai",
+        }),
+        false
+      );
+    });
+
+    test("rejects combo with non-deterministic strategy (weighted)", () => {
+      assert.equal(
+        shouldPreserveCacheControl({
+          userAgent: "claude-code/0.1.0",
+          isCombo: true,
+          comboStrategy: "weighted",
+          targetProvider: "claude",
+        }),
+        false
+      );
+    });
+
+    test("rejects combo with non-deterministic strategy (round-robin)", () => {
+      assert.equal(
+        shouldPreserveCacheControl({
+          userAgent: "claude-code/0.1.0",
+          isCombo: true,
+          comboStrategy: "round-robin",
+          targetProvider: "claude",
+        }),
+        false
+      );
+    });
+
+    test("rejects combo with non-deterministic strategy (random)", () => {
+      assert.equal(
+        shouldPreserveCacheControl({
+          userAgent: "claude-code/0.1.0",
+          isCombo: true,
+          comboStrategy: "random",
+          targetProvider: "claude",
+        }),
+        false
+      );
+    });
+
+    test("rejects combo with fill-first strategy", () => {
+      assert.equal(
+        shouldPreserveCacheControl({
+          userAgent: "claude-code/0.1.0",
+          isCombo: true,
+          comboStrategy: "fill-first",
+          targetProvider: "claude",
+        }),
+        false
+      );
+    });
+
+    test("rejects combo with p2c strategy", () => {
+      assert.equal(
+        shouldPreserveCacheControl({
+          userAgent: "claude-code/0.1.0",
+          isCombo: true,
+          comboStrategy: "p2c",
+          targetProvider: "claude",
+        }),
+        false
+      );
+    });
+
+    test("rejects combo with least-used strategy", () => {
+      assert.equal(
+        shouldPreserveCacheControl({
+          userAgent: "claude-code/0.1.0",
+          isCombo: true,
+          comboStrategy: "least-used",
+          targetProvider: "claude",
+        }),
+        false
+      );
+    });
+
+    test("rejects combo with strict-random strategy", () => {
+      assert.equal(
+        shouldPreserveCacheControl({
+          userAgent: "claude-code/0.1.0",
+          isCombo: true,
+          comboStrategy: "strict-random",
+          targetProvider: "claude",
+        }),
+        false
+      );
+    });
+
+    test("rejects combo with null strategy", () => {
+      assert.equal(
+        shouldPreserveCacheControl({
+          userAgent: "claude-code/0.1.0",
+          isCombo: true,
+          comboStrategy: null,
+          targetProvider: "claude",
+        }),
+        false
+      );
+    });
+
+    test("rejects when userAgent is null", () => {
+      assert.equal(
+        shouldPreserveCacheControl({
+          userAgent: null,
+          isCombo: false,
+          targetProvider: "claude",
+        }),
+        false
+      );
+    });
+
+    test("rejects when targetProvider is null", () => {
+      assert.equal(
+        shouldPreserveCacheControl({
+          userAgent: "claude-code/0.1.0",
+          isCombo: false,
+          targetProvider: null,
+        }),
+        false
+      );
+    });
+
+    describe("settings override", () => {
+      test("alwaysPreserveClientCache=always overrides auto detection", () => {
+        assert.equal(
+          shouldPreserveCacheControl({
+            userAgent: "curl/7.68.0", // non-Claude client
+            isCombo: false,
+            targetProvider: "claude",
+            settings: { alwaysPreserveClientCache: "always" },
+          }),
+          true
+        );
+      });
+
+      test("alwaysPreserveClientCache=never overrides auto detection", () => {
+        assert.equal(
+          shouldPreserveCacheControl({
+            userAgent: "claude-code/0.1.0", // Claude client
+            isCombo: false,
+            targetProvider: "claude",
+            settings: { alwaysPreserveClientCache: "never" },
+          }),
+          false
+        );
+      });
+
+      test("alwaysPreserveClientCache=auto uses automatic detection", () => {
+        // Should preserve for Claude client + caching provider
+        assert.equal(
+          shouldPreserveCacheControl({
+            userAgent: "claude-code/0.1.0",
+            isCombo: false,
+            targetProvider: "claude",
+            settings: { alwaysPreserveClientCache: "auto" },
+          }),
+          true
+        );
+
+        // Should NOT preserve for non-Claude client
+        assert.equal(
+          shouldPreserveCacheControl({
+            userAgent: "curl/7.68.0",
+            isCombo: false,
+            targetProvider: "claude",
+            settings: { alwaysPreserveClientCache: "auto" },
+          }),
+          false
+        );
+      });
+
+      test("undefined settings uses automatic detection", () => {
+        assert.equal(
+          shouldPreserveCacheControl({
+            userAgent: "claude-code/0.1.0",
+            isCombo: false,
+            targetProvider: "claude",
+            settings: undefined,
+          }),
+          true
+        );
+      });
+    });
+  });
+
+  describe("trackCacheMetrics", () => {
+    test("initializes empty metrics", () => {
+      const result = trackCacheMetrics({
+        preserved: true,
+        provider: "claude",
+        strategy: "priority",
+        metrics: undefined,
+        inputTokens: 1000,
+        cachedTokens: 500,
+        cacheCreationTokens: 200,
+      });
+
+      assert.equal(result.totalRequests, 1);
+      assert.equal(result.requestsWithCacheControl, 1);
+      assert.equal(result.totalInputTokens, 1000);
+      assert.equal(result.totalCachedTokens, 500);
+      assert.equal(result.totalCacheCreationTokens, 200);
+      assert.equal(result.tokensSaved, 500);
+    });
+
+    test("increments total requests without cache control", () => {
+      const metrics = {
+        totalRequests: 10,
+        requestsWithCacheControl: 5,
+        totalInputTokens: 5000,
+        totalCachedTokens: 2000,
+        totalCacheCreationTokens: 1000,
+        tokensSaved: 2000,
+        estimatedCostSaved: 0.5,
+        byProvider: {},
+        byStrategy: {},
+        lastUpdated: new Date().toISOString(),
+      };
+
+      const result = trackCacheMetrics({
+        preserved: false,
+        provider: "claude",
+        strategy: null,
+        metrics,
+        inputTokens: 500,
+        cachedTokens: 0,
+        cacheCreationTokens: 0,
+      });
+
+      assert.equal(result.totalRequests, 11);
+      assert.equal(result.requestsWithCacheControl, 5); // unchanged
+      assert.equal(result.totalInputTokens, 5500);
+    });
+
+    test("tracks requests with cache control preserved", () => {
+      const metrics = {
+        totalRequests: 0,
+        requestsWithCacheControl: 0,
+        totalInputTokens: 0,
+        totalCachedTokens: 0,
+        totalCacheCreationTokens: 0,
+        tokensSaved: 0,
+        estimatedCostSaved: 0,
+        byProvider: {},
+        byStrategy: {},
+        lastUpdated: new Date().toISOString(),
+      };
+
+      const result = trackCacheMetrics({
+        preserved: true,
+        provider: "claude",
+        strategy: "priority",
+        metrics,
+        inputTokens: 1000,
+        cachedTokens: 400,
+        cacheCreationTokens: 100,
+      });
+
+      assert.equal(result.totalRequests, 1);
+      assert.equal(result.requestsWithCacheControl, 1);
+      assert.equal(result.byProvider.claude.requests, 1);
+      assert.equal(result.byProvider.claude.inputTokens, 1000);
+      assert.equal(result.byProvider.claude.cachedTokens, 400);
+      assert.equal(result.byProvider.claude.cacheCreationTokens, 100);
+      assert.equal(result.byStrategy.priority.requests, 1);
+    });
+
+    test("tracks by provider", () => {
+      const metrics = {
+        totalRequests: 0,
+        requestsWithCacheControl: 0,
+        totalInputTokens: 0,
+        totalCachedTokens: 0,
+        totalCacheCreationTokens: 0,
+        tokensSaved: 0,
+        estimatedCostSaved: 0,
+        byProvider: {},
+        byStrategy: {},
+        lastUpdated: new Date().toISOString(),
+      };
+
+      let result = trackCacheMetrics({
+        preserved: true,
+        provider: "claude",
+        strategy: null,
+        metrics,
+        inputTokens: 1000,
+        cachedTokens: 300,
+        cacheCreationTokens: 100,
+      });
+
+      result = trackCacheMetrics({
+        preserved: true,
+        provider: "zai",
+        strategy: null,
+        metrics: result,
+        inputTokens: 800,
+        cachedTokens: 200,
+        cacheCreationTokens: 50,
+      });
+
+      assert.equal(result.byProvider.claude.requests, 1);
+      assert.equal(result.byProvider.claude.inputTokens, 1000);
+      assert.equal(result.byProvider.claude.cachedTokens, 300);
+      assert.equal(result.byProvider.zai.requests, 1);
+      assert.equal(result.byProvider.zai.inputTokens, 800);
+      assert.equal(result.byProvider.zai.cachedTokens, 200);
+    });
+
+    test("tracks by strategy", () => {
+      const metrics = {
+        totalRequests: 0,
+        requestsWithCacheControl: 0,
+        totalInputTokens: 0,
+        totalCachedTokens: 0,
+        totalCacheCreationTokens: 0,
+        tokensSaved: 0,
+        estimatedCostSaved: 0,
+        byProvider: {},
+        byStrategy: {},
+        lastUpdated: new Date().toISOString(),
+      };
+
+      let result = trackCacheMetrics({
+        preserved: true,
+        provider: "claude",
+        strategy: "priority",
+        metrics,
+        inputTokens: 1000,
+        cachedTokens: 300,
+        cacheCreationTokens: 100,
+      });
+
+      result = trackCacheMetrics({
+        preserved: true,
+        provider: "claude",
+        strategy: "cost-optimized",
+        metrics: result,
+        inputTokens: 800,
+        cachedTokens: 200,
+        cacheCreationTokens: 50,
+      });
+
+      assert.equal(result.byStrategy.priority.requests, 1);
+      assert.equal(result.byStrategy.priority.cachedTokens, 300);
+      assert.equal(result.byStrategy["cost-optimized"].requests, 1);
+      assert.equal(result.byStrategy["cost-optimized"].cachedTokens, 200);
+    });
+  });
+
+  describe("updateCacheTokenMetrics", () => {
+    test("updates token counts", () => {
+      const metrics = {
+        totalRequests: 10,
+        requestsWithCacheControl: 5,
+        totalInputTokens: 5000,
+        totalCachedTokens: 2000,
+        totalCacheCreationTokens: 1000,
+        tokensSaved: 2000,
+        estimatedCostSaved: 0.5,
+        byProvider: {
+          claude: {
+            requests: 3,
+            inputTokens: 3000,
+            cachedTokens: 1200,
+            cacheCreationTokens: 600,
+          },
+        },
+        byStrategy: {
+          priority: {
+            requests: 4,
+            inputTokens: 4000,
+            cachedTokens: 1600,
+            cacheCreationTokens: 800,
+          },
+        },
+        lastUpdated: new Date().toISOString(),
+      };
+
+      const result = updateCacheTokenMetrics({
+        metrics,
+        provider: "claude",
+        strategy: "priority",
+        inputTokens: 1000,
+        cachedTokens: 400,
+        cacheCreationTokens: 200,
+        costSaved: 0.02,
+      });
+
+      assert.equal(result.totalInputTokens, 6000);
+      assert.equal(result.totalCachedTokens, 2400);
+      assert.equal(result.totalCacheCreationTokens, 1200);
+      assert.equal(result.tokensSaved, 2400);
+      assert.equal(result.estimatedCostSaved, 0.52);
+    });
+
+    test("updates provider breakdown", () => {
+      const metrics = {
+        totalRequests: 10,
+        requestsWithCacheControl: 5,
+        totalInputTokens: 5000,
+        totalCachedTokens: 2000,
+        totalCacheCreationTokens: 1000,
+        tokensSaved: 2000,
+        estimatedCostSaved: 0.5,
+        byProvider: {
+          claude: {
+            requests: 3,
+            inputTokens: 3000,
+            cachedTokens: 1200,
+            cacheCreationTokens: 600,
+          },
+        },
+        byStrategy: {},
+        lastUpdated: new Date().toISOString(),
+      };
+
+      const result = updateCacheTokenMetrics({
+        metrics,
+        provider: "claude",
+        strategy: null,
+        inputTokens: 500,
+        cachedTokens: 200,
+        cacheCreationTokens: 100,
+      });
+
+      assert.equal(result.byProvider.claude.inputTokens, 3500);
+      assert.equal(result.byProvider.claude.cachedTokens, 1400);
+      assert.equal(result.byProvider.claude.cacheCreationTokens, 700);
+    });
+
+    test("updates strategy breakdown", () => {
+      const metrics = {
+        totalRequests: 10,
+        requestsWithCacheControl: 5,
+        totalInputTokens: 5000,
+        totalCachedTokens: 2000,
+        totalCacheCreationTokens: 1000,
+        tokensSaved: 2000,
+        estimatedCostSaved: 0.5,
+        byProvider: {},
+        byStrategy: {
+          priority: {
+            requests: 4,
+            inputTokens: 4000,
+            cachedTokens: 1600,
+            cacheCreationTokens: 800,
+          },
+        },
+        lastUpdated: new Date().toISOString(),
+      };
+
+      const result = updateCacheTokenMetrics({
+        metrics,
+        provider: "claude",
+        strategy: "priority",
+        inputTokens: 500,
+        cachedTokens: 200,
+        cacheCreationTokens: 100,
+      });
+
+      assert.equal(result.byStrategy.priority.inputTokens, 4500);
+      assert.equal(result.byStrategy.priority.cachedTokens, 1800);
+      assert.equal(result.byStrategy.priority.cacheCreationTokens, 900);
+    });
+  });
+});
@@ -0,0 +1,134 @@
+import { describe, test, before, after } from "node:test";
+import assert from "node:assert/strict";
+import { getCacheMetrics } from "../../src/lib/db/settings.ts";
+import { getDbInstance } from "../../src/lib/db/core.ts";
+
+describe("Cache Metrics Database", () => {
+  let db;
+
+  before(() => {
+    db = getDbInstance();
+    // Create usage_history table if it doesn't exist (mimicking production schema)
+    db.prepare(
+      `
+      CREATE TABLE IF NOT EXISTS usage_history (
+        id INTEGER PRIMARY KEY AUTOINCREMENT,
+        provider TEXT,
+        model TEXT,
+        connection_id TEXT,
+        api_key_id TEXT,
+        api_key_name TEXT,
+        tokens_input INTEGER DEFAULT 0,
+        tokens_output INTEGER DEFAULT 0,
+        tokens_cache_read INTEGER DEFAULT 0,
+        tokens_cache_creation INTEGER DEFAULT 0,
+        tokens_reasoning INTEGER DEFAULT 0,
+        status TEXT,
+        timestamp TEXT,
+        success INTEGER,
+        latency_ms INTEGER DEFAULT 0,
+        ttft_ms INTEGER DEFAULT 0,
+        error_code TEXT
+      )
+    `
+    ).run();
+  });
+
+  after(async () => {
+    // Clean up test data
+    db.prepare("DELETE FROM usage_history WHERE provider = 'test-provider'").run();
+  });
+
+  describe("getCacheMetrics", () => {
+    test("returns metrics even with no cache activity", async () => {
+      // Verify the function works even if usage_history has data but no cache activity
+      const metrics = await getCacheMetrics();
+
+      assert.ok(metrics.totalRequests >= 0);
+      assert.ok(metrics.totalInputTokens >= 0);
+      assert.ok(metrics.totalCachedTokens >= 0);
+      assert.ok(metrics.totalCacheCreationTokens >= 0);
+      assert.ok(metrics.tokensSaved >= 0);
+      assert.ok(metrics.lastUpdated);
+    });
+
+    test("returns aggregated metrics from usage_history", async () => {
+      // Clean up any existing test data first
+      db.prepare("DELETE FROM usage_history WHERE provider = 'test-provider'").run();
+
+      const now = new Date().toISOString();
+
+      db.prepare(
+        `
+        INSERT INTO usage_history (provider, model, connection_id, api_key_id, api_key_name,
+          tokens_input, tokens_output, tokens_cache_read, tokens_cache_creation, tokens_reasoning,
+          status, success, latency_ms, ttft_ms, error_code, timestamp)
+        VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
+      `
+      ).run(
+        "test-provider",
+        "test-model",
+        "test-connection",
+        "test-key-id",
+        "test-key",
+        1000, // tokens_input
+        500, // tokens_output
+        400, // tokens_cache_read
+        200, // tokens_cache_creation
+        0, // tokens_reasoning
+        "200", // status
+        1, // success
+        100, // latency_ms
+        50, // ttft_ms
+        null, // error_code
+        now // timestamp
+      );
+
+      // Insert another row
+      db.prepare(
+        `
+        INSERT INTO usage_history (provider, model, connection_id, api_key_id, api_key_name,
+          tokens_input, tokens_output, tokens_cache_read, tokens_cache_creation, tokens_reasoning,
+          status, success, latency_ms, ttft_ms, error_code, timestamp)
+        VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
+      `
+      ).run(
+        "test-provider",
+        "test-model",
+        "test-connection",
+        "test-key-id",
+        "test-key",
+        500, // tokens_input
+        300, // tokens_output
+        200, // tokens_cache_read
+        100, // tokens_cache_creation
+        0, // tokens_reasoning
+        "200", // status
+        1, // success
+        80, // latency_ms
+        40, // ttft_ms
+        null, // error_code
+        now // timestamp
+      );
+
+      const metrics = await getCacheMetrics();
+
+      // Should have at least the 2 test requests with cache activity
+      assert.ok(metrics.requestsWithCacheControl >= 2);
+      assert.ok(metrics.totalInputTokens >= 1500);
+      assert.ok(metrics.totalCachedTokens >= 600);
+      assert.ok(metrics.totalCacheCreationTokens >= 300);
+      assert.ok(metrics.tokensSaved >= 600);
+
+      // Check provider breakdown
+      assert.ok(metrics.byProvider["test-provider"]);
+      assert.ok(metrics.byProvider["test-provider"].requests >= 2);
+      assert.ok(metrics.byProvider["test-provider"].inputTokens >= 1500);
+      assert.ok(metrics.byProvider["test-provider"].cachedTokens >= 600);
+      assert.ok(metrics.byProvider["test-provider"].cacheCreationTokens >= 300);
+
+      // Clean up
+      db.prepare("DELETE FROM usage_history WHERE provider = 'test-provider'").run();
+    });
+  });
+});
@@ -0,0 +1,128 @@
+import test from "node:test";
+import assert from "node:assert/strict";
+import fs from "node:fs";
+import os from "node:os";
+import path from "node:path";
+
+const TEST_DATA_DIR = fs.mkdtempSync(path.join(os.tmpdir(), "omniroute-chat-combo-live-"));
+process.env.DATA_DIR = TEST_DATA_DIR;
+
+const core = await import("../../src/lib/db/core.ts");
+const providersDb = await import("../../src/lib/db/providers.ts");
+const chatRoute = await import("../../src/app/api/v1/chat/completions/route.ts");
+const {
+  clearModelUnavailability,
+  resetAllAvailability,
+  setModelUnavailable,
+} = await import("../../src/domain/modelAvailability.ts");
+const {
+  getCircuitBreaker,
+  resetAllCircuitBreakers,
+  STATE,
+} = await import("../../src/shared/utils/circuitBreaker.ts");
+
+const originalFetch = globalThis.fetch;
+
+async function resetStorage() {
+  core.resetDbInstance();
+  fs.rmSync(TEST_DATA_DIR, { recursive: true, force: true });
+  fs.mkdirSync(TEST_DATA_DIR, { recursive: true });
+  resetAllAvailability();
+  resetAllCircuitBreakers();
+}
+
+async function seedSuppressedConnection() {
+  return providersDb.createProviderConnection({
+    provider: "openai",
+    authType: "apikey",
+    name: "openai-live-test",
+    apiKey: "sk-live-test",
+    isActive: true,
+    testStatus: "credits_exhausted",
+    rateLimitedUntil: new Date(Date.now() + 60_000).toISOString(),
+  });
+}
+
+function makeRequest(extraHeaders = {}) {
+  return new Request("http://localhost/v1/chat/completions", {
+    method: "POST",
+    headers: {
+      "Content-Type": "application/json",
+      ...extraHeaders,
+    },
+    body: JSON.stringify({
+      model: "openai/gpt-4o-mini",
+      messages: [{ role: "user", content: "Reply with OK only." }],
+      max_tokens: 16,
+      stream: false,
+    }),
+  });
+}
+
+test.beforeEach(async () => {
+  globalThis.fetch = originalFetch;
+  await resetStorage();
+});
+
+test.afterEach(() => {
+  globalThis.fetch = originalFetch;
+  resetAllAvailability();
+  resetAllCircuitBreakers();
+});
+
+test.after(() => {
+  globalThis.fetch = originalFetch;
+  resetAllAvailability();
+  resetAllCircuitBreakers();
+  core.resetDbInstance();
+  fs.rmSync(TEST_DATA_DIR, { recursive: true, force: true });
+});
+
+test("combo live test bypasses local cooldown and breaker state to perform a real upstream request", async () => {
+  const created = await seedSuppressedConnection();
+
+  setModelUnavailable("openai", "gpt-4o-mini", 60_000, "test cooldown");
+  const breaker = getCircuitBreaker("openai");
+  breaker.state = STATE.OPEN;
+  breaker.lastFailureTime = Date.now();
+
+  const fetchCalls = [];
+  globalThis.fetch = async (url, init = {}) => {
+    fetchCalls.push({ url: String(url), init });
+    return Response.json({
+      id: "chatcmpl-live-test",
+      choices: [
+        {
+          message: {
+            role: "assistant",
+            content: "OK",
+          },
+        },
+      ],
+    });
+  };
+
+  const blockedByCooldown = await chatRoute.POST(makeRequest());
+  assert.equal(blockedByCooldown.status, 503);
+  assert.equal(fetchCalls.length, 0);
+
+  clearModelUnavailability("openai", "gpt-4o-mini");
+
+  const blockedByBreaker = await chatRoute.POST(makeRequest());
+  assert.equal(blockedByBreaker.status, 503);
+  assert.equal(fetchCalls.length, 0);
+
+  const liveResponse = await chatRoute.POST(
+    makeRequest({ "X-Internal-Test": "combo-health-check" })
+  );
+  const liveBody = await liveResponse.json();
+
+  assert.equal(liveResponse.status, 200);
+  assert.equal(fetchCalls.length, 1);
+  assert.match(fetchCalls[0].url, /\/chat\/completions$/);
+  assert.equal(fetchCalls[0].init.headers.Authorization, "Bearer sk-live-test");
+  assert.equal(liveBody.choices[0].message.content, "OK");
+
+  const updated = await providersDb.getProviderConnectionById(created.id);
+  assert.equal(updated.testStatus, "active");
+});
@@ -26,7 +26,7 @@ function mockLog() {
 function mockHandler(statusSequence) {
  let callIndex = 0;
  return async (body, modelStr) => {
-    const status = statusSequence[callIndex] ?? 200;
+    const status = statusSequence[callIndex] ?? statusSequence[statusSequence.length - 1] ?? 200;
    callIndex++;
    if (status === 200) {
      return new Response(JSON.stringify({ ok: true }), { status: 200 });
@@ -55,6 +55,7 @@ test("handleComboChat: circuit breaker opens after repeated 502 errors", async (
    name: "test-combo",
    models: [{ model: "groq/llama-3.3-70b", weight: 0 }],
    strategy: "priority",
+    config: { maxRetries: 0 },
  };

  const log = mockLog();
@@ -74,6 +75,7 @@ test("handleComboChat: circuit breaker opens after repeated 502 errors", async (

  // Breaker should now be OPEN
  const status = breaker.getStatus();
+  console.log("=== BREAKER STATUS AFTER 3 CALLS ===", status);
  assert.equal(status.state, STATE.OPEN, "Breaker should be OPEN after 3 failures");
  assert.equal(status.failureCount, 3, "Failure count should be 3");
 });
@@ -239,7 +239,7 @@ test("CodexExecutor preserves native responses payloads for Codex passthrough",
  assert.equal(transformed.stream, true);
  assert.equal(transformed.service_tier, "priority");
  assert.equal(transformed.instructions, "custom system prompt");
-  assert.equal(transformed.store, true);
+  assert.equal(transformed.store, false);
  assert.deepEqual(transformed.metadata, { source: "codex-client" });
  assert.equal(transformed.reasoning_effort, "high");
  assert.ok(!("_nativeCodexPassthrough" in transformed));
@@ -503,3 +503,29 @@ test("parseSSEToOpenAIResponse merges split tool call chunks by id without dupli
  assert.equal(parsed.choices[0].message.tool_calls[0].function.name, "sum");
  assert.equal(parsed.choices[0].message.tool_calls[0].function.arguments, '{"a":1}');
 });
+
+test("parseSSEToOpenAIResponse normalizes delta.reasoning alias to reasoning_content", () => {
+  const rawSSE = [
+    `data: ${JSON.stringify({
+      id: "chatcmpl_2",
+      object: "chat.completion.chunk",
+      choices: [{ index: 0, delta: { reasoning: "Let me think..." } }],
+    })}`,
+    `data: ${JSON.stringify({
+      id: "chatcmpl_2",
+      object: "chat.completion.chunk",
+      choices: [{ index: 0, delta: { reasoning: " The answer is 4." } }],
+    })}`,
+    `data: ${JSON.stringify({
+      id: "chatcmpl_2",
+      object: "chat.completion.chunk",
+      choices: [{ index: 0, delta: { content: "2+2=4" }, finish_reason: "stop" }],
+    })}`,
+    "data: [DONE]",
+  ].join("\n");
+
+  const parsed = parseSSEToOpenAIResponse(rawSSE, "moonshotai/kimi-k2.5");
+  assert.ok(parsed);
+  assert.equal(parsed.choices[0].message.reasoning_content, "Let me think... The answer is 4.");
+  assert.equal(parsed.choices[0].message.content, "2+2=4");
+});
@@ -155,3 +155,51 @@ test("builds compact Claude stream summary for detailed logs", () => {
  assert.equal(compact.usage.output_tokens, 7);
  assert.equal(compact._omniroute_stream.eventCount, 4);
 });
+
+test("builds compact OpenAI summary with reasoning alias (delta.reasoning)", () => {
+  const collector = createStructuredSSECollector({ stage: "provider_response" });
+
+  collector.push({
+    id: "chatcmpl_r1",
+    object: "chat.completion.chunk",
+    created: 100,
+    model: "moonshotai/kimi-k2.5",
+    choices: [{ index: 0, delta: { role: "assistant" } }],
+  });
+  collector.push({
+    id: "chatcmpl_r1",
+    object: "chat.completion.chunk",
+    created: 100,
+    model: "moonshotai/kimi-k2.5",
+    choices: [{ index: 0, delta: { reasoning: "Let me think..." } }],
+  });
+  collector.push({
+    id: "chatcmpl_r1",
+    object: "chat.completion.chunk",
+    created: 100,
+    model: "moonshotai/kimi-k2.5",
+    choices: [{ index: 0, delta: { content: "The answer is 4." } }],
+  });
+  collector.push({
+    id: "chatcmpl_r1",
+    object: "chat.completion.chunk",
+    created: 100,
+    model: "moonshotai/kimi-k2.5",
+    choices: [{ index: 0, delta: {}, finish_reason: "stop" }],
+    usage: { prompt_tokens: 10, completion_tokens: 20, total_tokens: 30 },
+  });
+
+  const summary = buildStreamSummaryFromEvents(
+    collector.getEvents(),
+    FORMATS.OPENAI,
+    "moonshotai/kimi-k2.5"
+  );
+  const compact = compactStructuredStreamPayload(
+    collector.build(summary, { includeEvents: false })
+  );
+
+  assert.equal(compact.object, "chat.completion");
+  assert.equal(compact.choices[0].message.content, "The answer is 4.");
+  assert.equal(compact.choices[0].message.reasoning_content, "Let me think...");
+  assert.equal(compact.choices[0].finish_reason, "stop");
+});
@@ -0,0 +1,422 @@
+import test from "node:test";
+import assert from "node:assert/strict";
+
+const { convertResponsesApiFormat } = await import(
+  "../../open-sse/translator/helpers/responsesApiHelper.ts"
+);
+const { openaiResponsesToOpenAIRequest, openaiToOpenAIResponsesRequest } = await import(
+  "../../open-sse/translator/request/openai-responses.ts"
+);
+
+test("convertResponsesApiFormat filters orphaned function_call_output items", () => {
+  const body = {
+    model: "gpt-4",
+    input: [
+      {
+        type: "function_call_output",
+        call_id: "orphaned_call",
+        output: "result",
+      },
+    ],
+  };
+  const result = convertResponsesApiFormat(body);
+  const toolMsgs = result.messages.filter((m) => m.role === "tool");
+  assert.equal(toolMsgs.length, 0);
+});
+
+test("convertResponsesApiFormat skips function_call items with empty names", () => {
+  const body = {
+    model: "gpt-4",
+    input: [
+      { type: "function_call", call_id: "c1", name: "", arguments: "{}" },
+      { type: "function_call", call_id: "c2", name: "  ", arguments: "{}" },
+    ],
+  };
+  const result = convertResponsesApiFormat(body);
+  const assistantMsgs = result.messages.filter((m) => m.role === "assistant");
+  assert.equal(assistantMsgs.length, 0);
+});
+
+test("Responses→Chat: input_image converted to image_url with detail", () => {
+  const body = {
+    model: "gpt-4",
+    input: [
+      {
+        type: "message",
+        role: "user",
+        content: [
+          { type: "input_text", text: "What is this?" },
+          { type: "input_image", image_url: "https://example.com/img.png", detail: "high" },
+        ],
+      },
+    ],
+  };
+  const result = openaiResponsesToOpenAIRequest(null, body, null, null);
+  const userMsg = result.messages.find((m) => m.role === "user");
+  const imgPart = userMsg.content.find((c) => c.type === "image_url");
+  assert.ok(imgPart, "should have image_url content part");
+  assert.equal(imgPart.image_url.url, "https://example.com/img.png");
+  assert.equal(imgPart.image_url.detail, "high");
+});
+
+test("Responses→Chat: input_image without detail omits detail field", () => {
+  const body = {
+    model: "gpt-4",
+    input: [
+      {
+        type: "message",
+        role: "user",
+        content: [{ type: "input_image", image_url: "https://example.com/img.png" }],
+      },
+    ],
+  };
+  const result = openaiResponsesToOpenAIRequest(null, body, null, null);
+  const userMsg = result.messages.find((m) => m.role === "user");
+  const imgPart = userMsg.content.find((c) => c.type === "image_url");
+  assert.ok(imgPart);
+  assert.equal(imgPart.image_url.url, "https://example.com/img.png");
+  assert.equal(imgPart.image_url.detail, undefined);
+});
+
+test("Chat→Responses: image_url detail preserved as input_image", () => {
+  const body = {
+    model: "gpt-4",
+    messages: [
+      {
+        role: "user",
+        content: [
+          { type: "text", text: "Describe" },
+          { type: "image_url", image_url: { url: "https://example.com/img.png", detail: "low" } },
+        ],
+      },
+    ],
+  };
+  const result = openaiToOpenAIResponsesRequest("gpt-4", body, true, null);
+  const userItem = result.input.find((i) => i.type === "message" && i.role === "user");
+  const imgPart = userItem.content.find((c) => c.type === "input_image");
+  assert.ok(imgPart, "should have input_image content part");
+  assert.equal(imgPart.image_url, "https://example.com/img.png");
+  assert.equal(imgPart.detail, "low");
+});
+
+test("Chat→Responses: image_url without detail omits detail", () => {
+  const body = {
+    model: "gpt-4",
+    messages: [
+      {
+        role: "user",
+        content: [
+          { type: "image_url", image_url: { url: "https://example.com/img.png" } },
+        ],
+      },
+    ],
+  };
+  const result = openaiToOpenAIResponsesRequest("gpt-4", body, true, null);
+  const userItem = result.input.find((i) => i.type === "message" && i.role === "user");
+  const imgPart = userItem.content.find((c) => c.type === "input_image");
+  assert.ok(imgPart);
+  assert.equal(imgPart.detail, undefined);
+});
+
+test("Responses→Chat: input_file converted to file content part", () => {
+  const body = {
+    model: "gpt-4",
+    input: [
+      {
+        type: "message",
+        role: "user",
+        content: [
+          { type: "input_file", file_id: "file-abc", filename: "data.csv" },
+        ],
+      },
+    ],
+  };
+  const result = openaiResponsesToOpenAIRequest(null, body, null, null);
+  const userMsg = result.messages.find((m) => m.role === "user");
+  const filePart = userMsg.content.find((c) => c.type === "file");
+  assert.ok(filePart, "should have file content part");
+  assert.equal(filePart.file.file_id, "file-abc");
+  assert.equal(filePart.file.filename, "data.csv");
+});
+
+test("Chat→Responses: file content part converted to input_file", () => {
+  const body = {
+    model: "gpt-4",
+    messages: [
+      {
+        role: "user",
+        content: [
+          { type: "file", file: { file_id: "file-abc", filename: "data.csv" } },
+        ],
+      },
+    ],
+  };
+  const result = openaiToOpenAIResponsesRequest("gpt-4", body, true, null);
+  const userItem = result.input.find((i) => i.type === "message" && i.role === "user");
+  const filePart = userItem.content.find((c) => c.type === "input_file");
+  assert.ok(filePart, "should have input_file content part");
+  assert.equal(filePart.file_id, "file-abc");
+  assert.equal(filePart.filename, "data.csv");
+});
+
+test("Responses→Chat: tool_choice {type:'function', name} wrapped to {type:'function', function:{name}}", () => {
+  const body = {
+    model: "gpt-4",
+    input: "hello",
+    tool_choice: { type: "function", name: "get_weather" },
+    tools: [{ type: "function", name: "get_weather", parameters: {} }],
+  };
+  const result = openaiResponsesToOpenAIRequest(null, body, null, null);
+  assert.deepEqual(result.tool_choice, {
+    type: "function",
+    function: { name: "get_weather" },
+  });
+});
+
+test("Chat→Responses: tool_choice {type:'function', function:{name}} unwrapped to {type:'function', name}", () => {
+  const body = {
+    model: "gpt-4",
+    messages: [{ role: "user", content: "hello" }],
+    tool_choice: { type: "function", function: { name: "get_weather" } },
+    tools: [{ type: "function", function: { name: "get_weather", parameters: {} } }],
+  };
+  const result = openaiToOpenAIResponsesRequest("gpt-4", body, true, null);
+  assert.deepEqual(result.tool_choice, {
+    type: "function",
+    name: "get_weather",
+  });
+});
+
+test("Responses→Chat: string tool_choice passes through unchanged", () => {
+  const body = { model: "gpt-4", input: "hello", tool_choice: "auto" };
+  const result = openaiResponsesToOpenAIRequest(null, body, null, null);
+  assert.equal(result.tool_choice, "auto");
+});
+
+test("Chat→Responses: string tool_choice passes through unchanged", () => {
+  const body = {
+    model: "gpt-4",
+    messages: [{ role: "user", content: "hello" }],
+    tool_choice: "required",
+  };
+  const result = openaiToOpenAIResponsesRequest("gpt-4", body, true, null);
+  assert.equal(result.tool_choice, "required");
+});
+
+test("Responses→Chat: built-in tool_choice type throws unsupported error", () => {
+  const body = {
+    model: "gpt-4",
+    input: "hello",
+    tool_choice: { type: "web_search_preview" },
+  };
+  assert.throws(
+    () => openaiResponsesToOpenAIRequest(null, body, null, null),
+    (err) => err.message.includes("web_search_preview")
+  );
+});
+
+test("Responses→Chat: web_search tool type throws unsupported error", () => {
+  const body = {
+    model: "gpt-4",
+    input: "search for cats",
+    tools: [{ type: "web_search", search_context_size: "medium" }],
+  };
+  assert.throws(
+    () => openaiResponsesToOpenAIRequest(null, body, null, null),
+    (err) => err.message.includes("web_search")
+  );
+});
+
+test("Responses→Chat: computer tool type throws unsupported error", () => {
+  const body = {
+    model: "gpt-4",
+    input: "click button",
+    tools: [{ type: "computer" }],
+  };
+  assert.throws(
+    () => openaiResponsesToOpenAIRequest(null, body, null, null),
+    (err) => err.message.includes("computer")
+  );
+});
+
+test("Responses→Chat: mcp tool type throws unsupported error", () => {
+  const body = {
+    model: "gpt-4",
+    input: "hello",
+    tools: [{ type: "mcp", server_label: "test", server_url: "https://example.com" }],
+  };
+  assert.throws(
+    () => openaiResponsesToOpenAIRequest(null, body, null, null),
+    (err) => err.message.includes("mcp")
+  );
+});
+
+test("Responses→Chat: non-string arguments are JSON-stringified", () => {
+  const body = {
+    model: "gpt-4",
+    input: [
+      { type: "function_call", call_id: "c1", name: "fn", arguments: { key: "val" } },
+      { type: "function_call_output", call_id: "c1", output: "ok" },
+    ],
+  };
+  const result = openaiResponsesToOpenAIRequest(null, body, null, null);
+  const assistantMsg = result.messages.find((m) => m.role === "assistant");
+  assert.equal(typeof assistantMsg.tool_calls[0].function.arguments, "string");
+  assert.equal(assistantMsg.tool_calls[0].function.arguments, '{"key":"val"}');
+});
+
+test("Chat→Responses: array tool content converts text→input_text types", () => {
+  const body = {
+    model: "gpt-4",
+    messages: [
+      { role: "user", content: "hello" },
+      {
+        role: "assistant",
+        content: null,
+        tool_calls: [{ id: "c1", type: "function", function: { name: "fn", arguments: "{}" } }],
+      },
+      {
+        role: "tool",
+        tool_call_id: "c1",
+        content: [{ type: "text", text: "result data" }],
+      },
+    ],
+  };
+  const result = openaiToOpenAIResponsesRequest("gpt-4", body, true, null);
+  const outputItem = result.input.find((i) => i.type === "function_call_output");
+  assert.ok(Array.isArray(outputItem.output), "output should be array");
+  assert.equal(outputItem.output[0].type, "input_text");
+  assert.equal(outputItem.output[0].text, "result data");
+});
+
+test("Responses→Chat: function tool type passes through", () => {
+  const body = {
+    model: "gpt-4",
+    input: "hello",
+    tools: [{ type: "function", name: "greet", parameters: {} }],
+  };
+  const result = openaiResponsesToOpenAIRequest(null, body, null, null);
+  assert.equal(result.tools.length, 1);
+  assert.equal(result.tools[0].type, "function");
+});
+
+test("Chat→Responses: deprecated function_call field on assistant converted to function_call item", () => {
+  const body = {
+    model: "gpt-4",
+    messages: [
+      { role: "user", content: "weather?" },
+      {
+        role: "assistant",
+        content: null,
+        function_call: { name: "get_weather", arguments: '{"city":"NYC"}' },
+      },
+    ],
+  };
+  const result = openaiToOpenAIResponsesRequest("gpt-4", body, true, null);
+  const fcItem = result.input.find((i) => i.type === "function_call");
+  assert.ok(fcItem, "should have function_call input item");
+  assert.equal(fcItem.name, "get_weather");
+  assert.equal(fcItem.arguments, '{"city":"NYC"}');
+  assert.ok(fcItem.call_id, "should have a call_id");
+});
+
+test("Chat→Responses: deprecated function role message converted to function_call_output", () => {
+  const body = {
+    model: "gpt-4",
+    messages: [
+      { role: "user", content: "weather?" },
+      {
+        role: "assistant",
+        content: null,
+        function_call: { name: "get_weather", arguments: '{"city":"NYC"}' },
+      },
+      { role: "function", name: "get_weather", content: '{"temp":72}' },
+    ],
+  };
+  const result = openaiToOpenAIResponsesRequest("gpt-4", body, true, null);
+  const fcOutput = result.input.find((i) => i.type === "function_call_output");
+  assert.ok(fcOutput, "should have function_call_output item");
+  assert.equal(fcOutput.output, '{"temp":72}');
+  // The call_ids should match between function_call and function_call_output
+  const fcItem = result.input.find((i) => i.type === "function_call");
+  assert.equal(fcOutput.call_id, fcItem.call_id);
+});
+
+const { openaiToOpenAIResponsesResponse, openaiResponsesToOpenAIResponse } = await import(
+  "../../open-sse/translator/response/openai-responses.ts"
+);
+const { initState } = await import("../../open-sse/translator/index.ts");
+const { FORMATS } = await import("../../open-sse/translator/formats.ts");
+
+test("Chat→Responses streaming: usage-only chunk is captured (not dropped)", () => {
+  const state = initState(FORMATS.OPENAI_RESPONSES);
+
+  // First chunk with content
+  const chunk1 = { choices: [{ index: 0, delta: { content: "hello" }, finish_reason: null }], id: "c1" };
+  openaiToOpenAIResponsesResponse(chunk1, state);
+
+  // Usage-only chunk (empty choices, has usage)
+  const usageChunk = {
+    choices: [],
+    usage: { prompt_tokens: 10, completion_tokens: 5, total_tokens: 15 },
+  };
+  const usageEvents = openaiToOpenAIResponsesResponse(usageChunk, state);
+  assert.ok(Array.isArray(usageEvents));
+
+  // Finish chunk
+  const finishChunk = { choices: [{ index: 0, delta: {}, finish_reason: "stop" }] };
+  const finishEvents = openaiToOpenAIResponsesResponse(finishChunk, state);
+  const completedEvent = finishEvents.find((e) => e.event === "response.completed");
+  assert.ok(completedEvent, "should have completed event");
+  assert.ok(completedEvent.data.response.usage, "completed event should include usage");
+  assert.equal(completedEvent.data.response.usage.prompt_tokens, 10);
+});
+
+test("Chat→Responses streaming: completed event includes accumulated output", () => {
+  const state = initState(FORMATS.OPENAI_RESPONSES);
+
+  // Text content
+  const chunk = { choices: [{ index: 0, delta: { content: "hello world" }, finish_reason: null }], id: "c1" };
+  openaiToOpenAIResponsesResponse(chunk, state);
+
+  // Finish
+  const finishChunk = { choices: [{ index: 0, delta: {}, finish_reason: "stop" }] };
+  const events = openaiToOpenAIResponsesResponse(finishChunk, state);
+  const completedEvent = events.find((e) => e.event === "response.completed");
+  assert.ok(completedEvent.data.response.output, "completed should have output");
+  assert.ok(completedEvent.data.response.output.length > 0, "output should not be empty");
+  const msgOutput = completedEvent.data.response.output.find((o) => o.type === "message");
+  assert.ok(msgOutput, "should have message output item");
+});
+
+test("Responses→Chat streaming: reasoning delta emits reasoning_content in Chat chunk", () => {
+  const state = { started: false, chatId: null, created: null, toolCallIndex: 0, finishReasonSent: false };
+
+  const chunk = {
+    type: "response.reasoning_summary_text.delta",
+    delta: "thinking step...",
+    item_id: "rs_1",
+    output_index: 0,
+    summary_index: 0,
+  };
+  const result = openaiResponsesToOpenAIResponse(chunk, state);
+  assert.ok(result, "should return a chunk");
+  assert.equal(result.choices[0].delta.reasoning_content, "thinking step...");
+});
+
+test("Chat→Responses streaming: multiple <think> tags in one chunk handled", () => {
+  const state = initState(FORMATS.OPENAI_RESPONSES);
+
+  // Chunk with multiple think tags
+  const chunk = {
+    choices: [{ index: 0, delta: { content: "<think>first</think>middle<think>second</think>end" }, finish_reason: null }],
+    id: "c1",
+  };
+  const events = openaiToOpenAIResponsesResponse(chunk, state);
+  // Should not have literal <think> in any text delta
+  const textDeltas = events
+    .filter((e) => e.event === "response.output_text.delta")
+    .map((e) => e.data.delta);
+  const combined = textDeltas.join("");
+  assert.ok(!combined.includes("<think>"), `text should not contain <think> tag, got: ${combined}`);
+});
Author	SHA1	Message	Date
mikhailsal	370070f489	fix(stream): normalize delta.reasoning alias and separate reasoning in client response (#771 ) Build Electron Desktop App / Validate version (push) Failing after 44s Details Build Electron Desktop App / Build Electron (macos-arm64) (push) Has been skipped Details Build Electron Desktop App / Build Electron (linux) (push) Has been skipped Details Build Electron Desktop App / Build Electron (macos-intel) (push) Has been skipped Details Build Electron Desktop App / Build Electron (windows) (push) Has been skipped Details Build Electron Desktop App / Create Release (push) Has been skipped Details Build Electron Desktop App / Publish to npm (push) Has been skipped Details * fix(stream): normalize delta.reasoning to reasoning_content in SSE streaming NVIDIA kimi-k2.5 (and potentially other providers) send reasoning tokens as `delta.reasoning` in SSE streaming chunks instead of the standard OpenAI `delta.reasoning_content` field. This caused reasoning content to be silently dropped during stream passthrough — clients received only the final answer with no reasoning separation. The non-streaming sanitizer (responseSanitizer.ts) already handled this alias, but the streaming pipeline did not. Fix applied in 4 locations: - stream.ts passthrough: normalize + force re-serialize sanitized chunk - stream.ts translate: accumulate reasoning from delta.reasoning - sseParser.ts: collect delta.reasoning in parseSSEToOpenAIResponse - streamPayloadCollector.ts: collect delta.reasoning in buildOpenAISummary * fix: eliminate injectedUsage reuse bug and add reasoning alias tests - Detect delta.reasoning alias before sanitizeStreamingChunk() which already normalizes it, removing dead post-sanitization normalization - Replace injectedUsage reuse with separate needsReserialization flag so reasoning re-serialization cannot block finish_reason/usage mutations on the same SSE chunk (fixes CRITICAL review finding) - Add unit test for parseSSEToOpenAIResponse reasoning alias - Add unit test for buildStreamSummaryFromEvents reasoning alias * fix(stream): separate reasoning from content in passthrough response body The passthroughAccumulatedContent variable was mixing delta.content and delta.reasoning_content into one string, causing the client_response log and responseBody to lose reasoning separation. - Add passthroughAccumulatedReasoning accumulator for reasoning deltas - Set message.reasoning_content in responseBody when reasoning exists - Only accumulate delta.content into passthroughAccumulatedContent * fix: trim leading whitespace from assembled content in log summaries NVIDIA and other providers emit token deltas with leading spaces (e.g. ' The', ' user'). When joined, these produce a leading space in the provider_response and parsed non-streaming response logs. Trim the joined content and reasoning_content in both buildOpenAISummary and parseSSEToOpenAIResponse for consistent log output. * fix(stream): split combined reasoning+content deltas into separate SSE events Some providers (e.g. NVIDIA NIM) send transition chunks with both `delta.reasoning` and `delta.content` in the same SSE event. After sanitization this becomes `reasoning_content` + `content`, which violates the standard OpenAI streaming contract where these fields are never mixed. Clients using if/else logic (LobeChat, etc.) skip content when reasoning_content is present, losing the first content token. Split such combined chunks into two separate SSE events: 1. Reasoning-only event (finish_reason=null, no usage) 2. Content-only event (carries finish_reason and usage)	2026-03-29 16:12:22 -03:00
Paijo	7168f4014d	fix: strip reasoning/thinking params for models that don't support them (#766 ) Models like antigravity/claude-sonnet-4-6 route through Google's internal Cloud Code API which returns HTTP 400 when thinking/reasoning parameters are included in the request body. Changes: - open-sse/services/modelCapabilities.ts: add supportsReasoning() function with a denylist of known-unsupported patterns (antigravity/claude-sonnet-*) and a registry-based lookup hook (supportsReasoning flag per model) - open-sse/services/thinkingBudget.ts: in applyThinkingBudget(), add early exit before the mode switch — if model string is present and supportsReasoning() returns false, call stripThinkingConfig() immediately regardless of the configured ThinkingMode This is fully backward-compatible: models not in the denylist are unaffected, and the supportsReasoning registry flag defaults to null (pass-through). Fixes: HTTP 400 errors on antigravity provider when client sends requests with thinking/reasoning budget parameters (e.g. claude-sonnet-4-6 via AG). Co-authored-by: oyi77 <oyi77@github.com> Co-authored-by: oyi77 <oyi77@users.noreply.github.com>	2026-03-29 16:12:19 -03:00
Paijo	f0912feefb	feat: auto-disable permanently banned provider accounts (with Settings toggle) (#765 ) * feat: auto-disable banned accounts setting with UI toggle Add a configurable setting to automatically disable provider accounts that return permanent/terminal errors (403 banned, ToS violation, etc.) Changes: - open-sse/services/accountFallback.ts: extend ACCOUNT_DEACTIVATED_SIGNALS with AG-specific ban messages ('verify your account', 'service disabled for violation') - src/app/api/settings/auto-disable-accounts/route.ts: new GET/PUT endpoint for the setting (enabled bool + threshold int) - src/shared/validation/schemas.ts: updateAutoDisableAccountsSchema - src/sse/services/auth.ts: in markAccountUnavailable(), capture result.permanent from checkFallbackError() and — when autoDisableBannedAccounts is enabled and backoffLevel >= threshold — set isActive=false on the connection Default: disabled (backward-compatible). Enable via Settings UI or PUT /api/settings/auto-disable-accounts { "enabled": true, "threshold": 3 } Fixes: antigravity accounts with 403/Verify-your-account errors being retried indefinitely in the rotation pool. Co-authored-by: oyi77 <oyi77@users.noreply.github.com> * fix: address reviewer comments for auto-disable (use getCachedSettings, immediate disable on permanent bans) --------- Co-authored-by: oyi77 <oyi77@github.com> Co-authored-by: oyi77 <oyi77@users.noreply.github.com>	2026-03-29 16:12:17 -03:00
Diego Rodrigues de Sa e Souza	af338d447b	Merge pull request #768 from diegosouzapw/release/v3.3.0 Build Electron Desktop App / Validate version (push) Failing after 32s Details Build Electron Desktop App / Build Electron (macos-arm64) (push) Has been skipped Details Build Electron Desktop App / Build Electron (linux) (push) Has been skipped Details Build Electron Desktop App / Build Electron (macos-intel) (push) Has been skipped Details Build Electron Desktop App / Build Electron (windows) (push) Has been skipped Details Build Electron Desktop App / Create Release (push) Has been skipped Details Build Electron Desktop App / Publish to npm (push) Has been skipped Details chore(release): v3.3.0 — test stability, release consolidation	2026-03-29 14:30:59 -03:00
diegosouzapw	6fad06f659	chore(release): v3.3.0 — test stability, release consolidation	2026-03-29 14:22:25 -03:00
diegosouzapw	1d51d8ff27	chore(release): v3.2.9 — combo diagnostics, quality gates, Gemini tool fix Build Electron Desktop App / Validate version (push) Failing after 32s Details Build Electron Desktop App / Build Electron (macos-arm64) (push) Has been skipped Details Build Electron Desktop App / Build Electron (linux) (push) Has been skipped Details Build Electron Desktop App / Build Electron (macos-intel) (push) Has been skipped Details Build Electron Desktop App / Build Electron (windows) (push) Has been skipped Details Build Electron Desktop App / Create Release (push) Has been skipped Details Build Electron Desktop App / Publish to npm (push) Has been skipped Details	2026-03-29 14:16:37 -03:00
Randi	8af9bd1ac3	Force real upstream combo live tests (#759 )	2026-03-29 13:21:53 -03:00
LASTHXH	9fc3845d92	Fix Gemini API error with integer enum in tool parameters (#760 ) Gemini API returns 400 error when tools have enum constraints on integer/number types: "enum: only allowed for STRING type" This fix removes enum constraints for integer and number types in JSON schemas before sending to Gemini API, while keeping enum for string types. Fixes tools like mcp__pointer__get-pointed-element that use integer enums for cssLevel and textDetail parameters. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-29 13:21:51 -03:00
Gorchakov-Pressure	93bbe8e7a8	feat(combo): response quality validation, circuit breaker fix, Cursor 4.6 models (#762 ) - Add `validateResponseQuality()` to detect empty/invalid 200 responses from upstream providers in combo routing. Non-streaming responses with empty body, invalid JSON, or missing content/tool_calls now trigger circuit breaker failure and fallback to the next model instead of being returned to the client. - Add missing `breaker._onSuccess()` calls in both priority and round-robin combo paths. Previously failures accumulated without reset, causing premature circuit breaker trips on healthy models. - Update Cursor provider registry with Claude 4.6 model IDs (opus-high, sonnet-high, haiku, opus + thinking variants). Keep 4.5 IDs for backward compatibility. - Update free-stack preset: replace duplicate qw/qwen3-coder-plus with if/deepseek-v3.2 for better model diversity. - Add paid-premium combo template for round-robin load distribution across paid subscription providers (Cursor, Antigravity). Made-with: Cursor	2026-03-29 13:21:48 -03:00
Diego Rodrigues de Sa e Souza	46acd16999	chore(release): v3.2.8 — Docker Auto-Update & Analytics Fixes (#755 ) * chore(release): v3.2.8 — Docker auto-update UI and cache analytics fixes * fix(sse): remove race condition in cache metrics tracking (#758) - Remove in-memory metrics tracking (currentMetrics, trackCacheMetrics, updateCacheMetrics) - Cache metrics now computed on-the-fly from usage_history table (single source of truth) - Fixes CRITICAL issue from code review: concurrent requests overwriting metrics - Fixes WARNING: duplicate metric tracking logic in streaming/non-streaming paths Ref: PR #752 (merged before this fix was included) * fix: handle allRateLimited credentials & forward extra body keys in embeddings/images routes (#757) * fix: handle allRateLimited credentials in embeddings and images routes When getProviderCredentials() returns an allRateLimited object (truthy, but without apiKey/accessToken), the embeddings and images routes incorrectly passed it to handlers as valid credentials. The handlers then sent upstream requests without Authorization headers, causing 401 errors from providers (e.g. NVIDIA NIM). This only manifested under concurrent requests: a chat/completions call could trigger rate limiting on a provider account, and a simultaneous embeddings request would receive the allRateLimited sentinel — but treat it as valid credentials. The chat pipeline already handled this case correctly. This commit adds the same allRateLimited guard to all affected routes: - POST /v1/embeddings - POST /v1/providers/{provider}/embeddings - POST /v1/images/generations - POST /v1/providers/{provider}/images/generations Also adds a defense-in-depth guard in the embeddings handler itself: if no auth token is available for a non-local provider, return 401 immediately instead of sending an unauthenticated request upstream. Made-with: Cursor * fix(embeddings): forward extra body keys to upstream providers The embeddings handler only forwarded model, input, dimensions, and encoding_format to upstream providers, silently dropping any additional fields. This broke asymmetric embedding APIs (e.g. NVIDIA NIM nv-embedqa-e5-v5) that require input_type, and other providers expecting user or truncate parameters. Add a KNOWN_FIELDS exclusion set and forward all unrecognized body keys to the upstream request, matching the passthrough pattern used by the chat pipeline's DefaultExecutor.transformRequest(). Made-with: Cursor * fix(auth): redirect and unconditional 401 on disabled requireLogin + fix test cases * fix(build): remove legacy proxy.ts causing Next.js build collision * fix(build): revert middleware.ts rename to proxy.ts because of Next.js Edge constraints --------- Co-authored-by: diegosouzapw <diegosouzapw@users.noreply.github.com> Co-authored-by: tombii <tombii@users.noreply.github.com> Co-authored-by: Gorchakov-Pressure <117600961+Gorchakov-Pressure@users.noreply.github.com>	2026-03-29 13:09:38 -03:00
diegosouzapw	5ad2c6abf6	Fix merge conflicts	2026-03-29 11:26:17 -03:00
Diego Rodrigues de Sa e Souza	d5781d60bd	Merge pull request #752 from tombii/feat/preserve-client-cache-control feat: preserve client cache_control with deterministic routing + metrics dashboard	2026-03-29 11:23:22 -03:00
Diego Rodrigues de Sa e Souza	e464a95c5a	Merge pull request #747 from AveryanAlex/fix/responses-chat-translation-bugs Improve responses<->chat translation	2026-03-29 11:23:19 -03:00
Diego Rodrigues de Sa e Souza	a50ea4bb9e	Merge pull request #746 from AveryanAlex/fix/codex-passthrough-store-instructions fix: ensure Codex passthrough path sets instructions and store=false	2026-03-29 11:23:05 -03:00
Diego Rodrigues de Sa e Souza	aa11bb6d93	Merge pull request #753 from LASTHXH/fix/cli-tools-status-undefined Fix CLI tools status endpoint crash and add droid detection support	2026-03-29 11:22:45 -03:00
tombii	319018f055	test: fix cache metrics tests with usage_history table - Add usage_history table creation in test setup - Simplify byStrategy query to avoid non-existent combo_strategy column - Update test assertions to work with existing test data	2026-03-29 16:05:32 +02:00
LASTHXH	394b986ccb	Fix CLI tools status endpoint crash and add droid detection support 1. Fixed crash in /api/cli-tools/status when statuses[toolId] is undefined - Added null check before accessing statuses[toolId] properties - Prevents "Cannot set property of undefined" error 2. Added support for droid.exe detection in ~/bin directory - Added ~/bin and ~/.local/bin to EXPECTED_PARENT_PATHS - Added droid.exe variant to toolBins for Windows - Added specific path check for droid in ~/bin/droid.exe These fixes resolve issues where CLI tools (Claude Code, Codex, Droid) were showing as "not installed" even when properly installed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-29 17:44:37 +05:00
tombii	26f7b36ce4	feat: add cache control settings and token-based metrics Settings: - Add `alwaysPreserveClientCache` setting with modes: auto/always/never - UI toggle in Dashboard > Settings > Routing tab - Auto mode preserves cache_control for Claude Code clients with deterministic routing Metrics: - Track prompt cache token usage (input, cached, creation) - Display cache reuse ratio (cached/input tokens) - Breakdown by provider and routing strategy - Shows tokens saved and estimated cost savings API Endpoints: - GET /api/settings/cache-metrics - retrieve metrics - DELETE /api/settings/cache-metrics - reset metrics Files: - open-sse/utils/cacheControlPolicy.ts: CacheControlMetrics interface, trackCacheMetrics, updateCacheTokenMetrics - open-sse/handlers/chatCore.ts: Track cache tokens from provider responses - src/lib/db/settings.ts: Database functions for metrics persistence - src/lib/cacheControlSettings.ts: Cached settings accessor - src/app/(dashboard)/dashboard/settings/components/CacheStatsCard.tsx: Metrics dashboard UI - tests/unit/*.test.mjs: Unit tests (41 tests pass) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-29 14:37:55 +02:00
cai kerui	f0daad10ce	Add Docker-aware dashboard auto-update flow	2026-03-29 20:25:14 +09:00
tombii	0bc557fb8b	feat(sse): preserve client cache_control for Claude Code with deterministic routing Adds intelligent cache control preservation for Claude Code clients: - New cacheControlPolicy.ts module with detection logic: - isClaudeCodeClient(): Detects Claude Code via User-Agent - providerSupportsCaching(): Checks provider (claude, anthropic, zai, qwen) - isDeterministicStrategy(): Identifies priority/cost-optimized strategies - shouldPreserveCacheControl(): Main policy decision - Cache control is preserved when: 1. Client is Claude Code (detected via User-Agent) 2. Provider supports prompt caching 3. Request routing is deterministic: - Single model requests (always) - Combo with priority or cost-optimized strategy only - Updated translator to accept preserveCacheControl option - Updated chatCore and chat handler to propagate combo strategy - Added comprehensive unit tests (24 tests) Non-deterministic combo strategies (weighted, round-robin, random, etc.) continue to use OmniRoute's managed caching strategy. Refs: #cache-control-preservation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-29 12:24:44 +02:00
tombii	3571421a0e	fix(ci): push sync to correct fork repo	2026-03-29 10:45:21 +02:00
tombii	aed80f3e4f	ci: add upstream sync workflow	2026-03-29 10:44:45 +02:00
AveryanAlex	fdaeccf1e5	fix: use replaceAll for think tags to handle multiple occurrences	2026-03-29 00:26:24 +03:00
AveryanAlex	7723e46c26	fix: emit reasoning_content in Responses→Chat streaming translation	2026-03-29 00:23:54 +03:00
AveryanAlex	dce355cce6	fix: capture usage and accumulate output in response.completed event	2026-03-29 00:23:06 +03:00
AveryanAlex	213e7b7093	fix: handle deprecated function_call field and function role in Chat→Responses	2026-03-29 00:18:50 +03:00
AveryanAlex	fe7d8f93a1	fix: stringify arguments and convert tool output content types	2026-03-29 00:17:49 +03:00
AveryanAlex	9e2f4216f9	fix: reject all non-function tool types in Responses→Chat translation	2026-03-29 00:16:59 +03:00
AveryanAlex	a48f7b2222	fix: translate tool_choice object format between Responses and Chat APIs	2026-03-29 00:14:26 +03:00
AveryanAlex	0b85d8a9bc	fix: translate input_file↔file content parts	2026-03-29 00:12:18 +03:00
AveryanAlex	58d6938065	fix: translate input_image↔image_url with detail preservation	2026-03-29 00:11:25 +03:00
AveryanAlex	a536a2b822	refactor: consolidate responsesApiHelper to delegate to main translator	2026-03-29 00:09:54 +03:00
AveryanAlex	769be46bf9	fix: ensure Codex passthrough path sets instructions and store=false The native Codex passthrough path returned early before injecting default instructions and enforcing store=false. Clients sending Responses API requests without instructions (e.g. opencode) got 400 "Instructions are required", and requests missing store=false got 400 "Store must be set to false" from the Codex upstream. Move both assignments before the passthrough return so they apply to all code paths.	2026-03-27 19:40:55 +03:00