chore(release): v3.2.2 — Four-Stage Request Logs & Bugfixes

Merge branch 'codex/request-log-pipeline-json'
test: align pipeline integration assertions
2026-03-28 22:11:22 -03:00 · 2026-03-28 22:09:34 -03:00 · 2026-03-28 22:09:27 -03:00 · 2026-03-28 22:09:27 -03:00 · 2026-03-28 22:09:27 -03:00 · 2026-03-28 22:07:20 -03:00
86 changed files with 3450 additions and 407 deletions
@@ -37,6 +37,13 @@ jobs:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}

+      - name: Login to GitHub Container Registry
+        uses: docker/login-action@v4
+        with:
+          registry: ghcr.io
+          username: ${{ github.actor }}
+          password: ${{ secrets.GITHUB_TOKEN }}
+
      - name: Extract version from release tag or input
        id: version
        run: |
@@ -59,6 +66,8 @@ jobs:
          tags: |
            ${{ env.IMAGE_NAME }}:${{ steps.version.outputs.version }}
            ${{ env.IMAGE_NAME }}:latest
+            ghcr.io/diegosouzapw/omniroute:${{ steps.version.outputs.version }}
+            ghcr.io/diegosouzapw/omniroute:latest
          cache-from: type=gha
          cache-to: type=gha,mode=max
          no-cache: false
@@ -105,3 +105,21 @@ jobs:
          echo "✅ Published omniroute@$VERSION (tag: $TAG)"
        env:
          NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
+
+      - name: Publish to GitHub Packages
+        run: |
+          VERSION="${{ steps.resolve.outputs.version }}"
+          TAG="${{ steps.resolve.outputs.tag }}"
+          
+          echo "Configuring for GitHub Packages..."
+          echo "//npm.pkg.github.com/:_authToken=${{ secrets.GITHUB_TOKEN }}" > .npmrc
+          npm pkg set name="@diegosouzapw/omniroute"
+          
+          if [ "$TAG" = "latest" ]; then
+            npm publish --registry=https://npm.pkg.github.com || echo "⚠️ Version ${VERSION} might already be published on GitHub."
+          else
+            npm publish --registry=https://npm.pkg.github.com --tag "$TAG" || echo "⚠️ Version ${VERSION} might already be published on GitHub."
+          fi
+          echo "✅ Action finished for GitHub Packages"
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
@@ -4,6 +4,53 @@

 ---

+## [3.2.2] — 2026-03-29
+
+### ✨ New Features
+
+- **Four-Stage Request Log Pipeline (#705)** — Refactored log persistence to save comprehensive payloads at four distinct pipeline stages: Client Request, Translated Provider Request, Provider Response, and Translated Client Response. Introduced `streamPayloadCollector` for robust SSE stream truncation and payload serialization.
+
+### 🐛 Bug Fixes
+
+- **Mobile UI Fixes (#659)** — Prevented table components on the dashboard from breaking the layout on narrow viewports by adding proper horizontal scrolling and overflow containment to `DashboardLayout`.
+- **Claude Prompt Cache Fixes (#708)** — Ensured `cache_control` blocks in Claude-to-Claude fallback loops are faithfully preserved and passed safely back to Anthropic models.
+- **Gemini Tool Definitions (#725)** — Fixed schema translation errors when declaring simple `object` parameter types for Gemini function calling.
+
+## [3.2.1] — 2026-03-29
+
+### ✨ New Features
+
+- **Global Fallback Provider (#689)** — When all combo models are exhausted (502/503), OmniRoute now attempts a configurable global fallback model before returning the error. Set `globalFallbackModel` in settings to enable.
+
+### 🐛 Bug Fixes
+
+- **Fix #721** — Fixed context pinning bypass during tool-call responses. Non-streaming tagging used wrong JSON path (`json.messages` → `json.choices[0].message`). Streaming injection now triggers on `finish_reason` chunks for tool-call-only streams. `injectModelTag()` now appends synthetic pin messages for non-string content.
+- **Fix #709** — Confirmed already fixed (v3.1.9) — `system-info.mjs` creates directories recursively. Closed.
+- **Fix #707** — Confirmed already fixed (v3.1.9) — empty tool name sanitization in `chatCore.ts`. Closed.
+
+### 🧪 Tests
+
+- Added 6 unit tests for context pinning with tool-call responses (null content, array content, roundtrip, re-injection)
+
+## [3.2.0] — 2026-03-28
+
+### ✨ New Features
+
+- **Cache Management UI** — Added a dedicated semantic caching dashboard at \`/dashboard/cache\` with targeted API invalidation and 31-language i18n support (PR #701 by @oyi77)
+- **GLM Quota Tracking** — Added real-time usage and session quota tracking for the GLM Coding (Z.AI) provider (PR #698 by @christopher-s)
+- **Detailed Log Payloads** — Wired full four-stage pipeline payload capturing (original, translated, provider-response, streamed-deltas) directly into the UI (PR #705 by @rdself)
+
+### 🐛 Bug Fixes
+
+- **Fix #708** — Prevented token bleeding for Claude Code users routing through OmniRoute by correctly preserving native \`cache_control\` headers during Claude-to-Claude passthrough (PR #708 by @tombii)
+- **Fix #719** — Setup internal auth boundaries for \`ModelSyncScheduler\` to prevent unauthenticated daemon failures on startup (PR #719 by @rdself)
+- **Fix #718** — Rebuilt badge rendering in Provider Limits UI preventing bad quota boundaries overlap (PR #718 by @rdself)
+- **Fix #704** — Fixed Combo Fallbacks breaking on HTTP 400 content-policy errors preventing model-rotation dead-routing (PR #704 by @rdself)
+
+### 🔒 Security & Dependencies
+
+- Bumped \`path-to-regexp\` to \`8.4.0\` resolving dependabot vulnerabilities (PR #715)
+
 ## [3.1.10] — 2026-03-28

 ### 🐛 Bug Fixes
@@ -47,6 +94,10 @@
 | `tests/unit/t40-opencode-cli-tools-integration.test.mjs` | CLI tool integration tests                                  |
 | `COVERAGE_PLAN.md`                                       | Test coverage planning document                             |

+### 🐛 Bug Fixes
+
+- **Claude Prompt Caching Passthrough** — Fixed cache_control markers being stripped in Claude passthrough mode (Claude → OmniRoute → Claude), which caused Claude Code users to deplete their Anthropic API quota 5-10x faster than direct connections. OmniRoute now preserves client's cache_control markers when sourceFormat and targetFormat are both Claude, ensuring prompt caching works correctly and dramatically reducing token consumption.
+
 ## [3.1.8] - 2026-03-27

 ### 🐛 Bug Fixes & Features
@@ -2,7 +2,7 @@

 🌐 **Languages:** 🇺🇸 [English](ARCHITECTURE.md) | 🇧🇷 [Português (Brasil)](i18n/pt-BR/ARCHITECTURE.md) | 🇪🇸 [Español](i18n/es/ARCHITECTURE.md) | 🇫🇷 [Français](i18n/fr/ARCHITECTURE.md) | 🇮🇹 [Italiano](i18n/it/ARCHITECTURE.md) | 🇷🇺 [Русский](i18n/ru/ARCHITECTURE.md) | 🇨🇳 [中文 (简体)](i18n/zh-CN/ARCHITECTURE.md) | 🇩🇪 [Deutsch](i18n/de/ARCHITECTURE.md) | 🇮🇳 [हिन्दी](i18n/in/ARCHITECTURE.md) | 🇹🇭 [ไทย](i18n/th/ARCHITECTURE.md) | 🇺🇦 [Українська](i18n/uk-UA/ARCHITECTURE.md) | 🇸🇦 [العربية](i18n/ar/ARCHITECTURE.md) | 🇯🇵 [日本語](i18n/ja/ARCHITECTURE.md) | 🇻🇳 [Tiếng Việt](i18n/vi/ARCHITECTURE.md) | 🇧🇬 [Български](i18n/bg/ARCHITECTURE.md) | 🇩🇰 [Dansk](i18n/da/ARCHITECTURE.md) | 🇫🇮 [Suomi](i18n/fi/ARCHITECTURE.md) | 🇮🇱 [עברית](i18n/he/ARCHITECTURE.md) | 🇭🇺 [Magyar](i18n/hu/ARCHITECTURE.md) | 🇮🇩 [Bahasa Indonesia](i18n/id/ARCHITECTURE.md) | 🇰🇷 [한국어](i18n/ko/ARCHITECTURE.md) | 🇲🇾 [Bahasa Melayu](i18n/ms/ARCHITECTURE.md) | 🇳🇱 [Nederlands](i18n/nl/ARCHITECTURE.md) | 🇳🇴 [Norsk](i18n/no/ARCHITECTURE.md) | 🇵🇹 [Português (Portugal)](i18n/pt/ARCHITECTURE.md) | 🇷🇴 [Română](i18n/ro/ARCHITECTURE.md) | 🇵🇱 [Polski](i18n/pl/ARCHITECTURE.md) | 🇸🇰 [Slovenčina](i18n/sk/ARCHITECTURE.md) | 🇸🇪 [Svenska](i18n/sv/ARCHITECTURE.md) | 🇵🇭 [Filipino](i18n/phi/ARCHITECTURE.md) | 🇨🇿 [Čeština](i18n/cs/ARCHITECTURE.md)

-_Last updated: 2026-03-24_
+_Last updated: 2026-03-28_

 ## Executive Summary

@@ -274,8 +274,9 @@ Domain State DB (SQLite):

 ## 5) Cloud Sync

- Scheduler init: `src/lib/initCloudSync.ts`, `src/shared/services/initializeCloudSync.ts`
+- Scheduler init: `src/lib/initCloudSync.ts`, `src/shared/services/initializeCloudSync.ts`, `src/shared/services/modelSyncScheduler.ts`
 - Periodic task: `src/shared/services/cloudSyncScheduler.ts`
+- Periodic task: `src/shared/services/modelSyncScheduler.ts`
 - Control route: `src/app/api/sync/cloud/route.ts`

 ## Request Lifecycle (`/v1/chat/completions`)
@@ -355,7 +356,7 @@ flowchart TD
    Q -- No --> R[Return all unavailable]
 ```

-Fallback decisions are driven by `open-sse/services/accountFallback.ts` using status codes and error-message heuristics.
+Fallback decisions are driven by `open-sse/services/accountFallback.ts` using status codes and error-message heuristics. Combo routing adds one extra guard: provider-scoped 400s such as upstream content-block and role-validation failures are treated as model-local failures so later combo targets can still run.

 ## OAuth Onboarding and Token Refresh Lifecycle

@@ -755,10 +756,18 @@ Runtime visibility sources:

 - console logs from `src/sse/utils/logger.ts`
 - per-request usage aggregates in SQLite (`usage_history`, `call_logs`, `proxy_logs`)
+- four-stage detailed payload captures in SQLite (`request_detail_logs`) when `settings.detailed_logs_enabled=true`
 - textual request status log in `log.txt` (optional/compat)
 - optional deep request/translation logs under `logs/` when `ENABLE_REQUEST_LOGS=true`
 - dashboard usage endpoints (`/api/usage/*`) for UI consumption

+Detailed request payload capture stores up to four JSON payload stages per routed call:
+
+- raw request received from the client
+- translated request actually sent upstream
+- provider response reconstructed as JSON (including streamed event sequences when applicable)
+- final client response returned by OmniRoute
+
 ## Security-Sensitive Boundaries

 - JWT secret (`JWT_SECRET`) secures dashboard session cookie verification/signing
@@ -1,7 +1,7 @@
 openapi: 3.1.0
 info:
  title: OmniRoute API
-  version: 3.1.10
+  version: 3.2.2
  description: |
    OmniRoute is a local-first AI API proxy router. It provides an OpenAI-compatible
    endpoint that routes requests to multiple AI providers with load balancing,
@@ -14,10 +14,16 @@ import { createRequestLogger } from "../utils/requestLogger.ts";
 import { getModelTargetFormat, PROVIDER_ID_TO_ALIAS } from "../config/providerModels.ts";
 import { resolveModelAlias } from "../services/modelDeprecation.ts";
 import { getUnsupportedParams } from "../config/providerRegistry.ts";
-import { createErrorResult, parseUpstreamError, formatProviderError } from "../utils/error.ts";
+import {
+  buildErrorBody,
+  createErrorResult,
+  parseUpstreamError,
+  formatProviderError,
+} from "../utils/error.ts";
 import { HTTP_STATUS, PROVIDER_MAX_TOKENS } from "../config/constants.ts";
 import { classifyProviderError, PROVIDER_ERROR_TYPES } from "../services/errorClassifier.ts";
 import { updateProviderConnection } from "@/lib/db/providers";
+import { isDetailedLoggingEnabled, saveRequestDetailLog } from "@/lib/db/detailedLogs";
 import { logAuditEvent } from "@/lib/compliance";
 import { handleBypassRequest } from "../utils/bypassHandler.ts";
 import {
@@ -72,6 +78,8 @@ import {
  EMERGENCY_FALLBACK_CONFIG,
 } from "../services/emergencyFallback.ts";
 import { resolveStreamFlag, stripMarkdownCodeFence } from "../utils/aiSdkCompat.ts";
+import { generateRequestId } from "@/shared/utils/requestId";
+import { normalizePayloadForLog } from "@/lib/logPayloads";

 export function shouldUseNativeCodexPassthrough({
  provider,
@@ -391,7 +399,8 @@ export async function handleChatCore({

      credentials.providerSpecificData = nextProviderData;
    } catch (err) {
-      log?.debug?.("CODEX", `Failed to persist codex quota state: ${err?.message || err}`);
+      const errMessage = err instanceof Error ? err.message : String(err);
+      log?.debug?.("CODEX", `Failed to persist codex quota state: ${errMessage}`);
    }
  };

@@ -486,6 +495,88 @@ export async function handleChatCore({
  const alias = PROVIDER_ID_TO_ALIAS[provider] || provider;
  const modelTargetFormat = getModelTargetFormat(alias, resolvedModel);
  const targetFormat = modelTargetFormat || getTargetFormat(provider);
+  const noLogEnabled = apiKeyInfo?.noLog === true;
+  const detailedLoggingEnabled = !noLogEnabled && (await isDetailedLoggingEnabled());
+  const persistAttemptLogs = ({
+    status,
+    tokens,
+    responseBody,
+    error,
+    providerRequest,
+    providerResponse,
+    clientResponse,
+    claudeCacheMeta,
+    claudeCacheUsageMeta,
+  }: {
+    status: number;
+    tokens?: unknown;
+    responseBody?: unknown;
+    error?: string | null;
+    providerRequest?: unknown;
+    providerResponse?: unknown;
+    clientResponse?: unknown;
+    claudeCacheMeta?: any;
+    claudeCacheUsageMeta?: any;
+  }) => {
+    const callLogId = generateRequestId();
+
+    saveCallLog({
+      id: callLogId,
+      method: "POST",
+      path: clientRawRequest?.endpoint || "/v1/chat/completions",
+      status,
+      model,
+      requestedModel,
+      provider,
+      connectionId,
+      duration: Date.now() - startTime,
+      tokens: tokens || {},
+      requestBody: attachLogMeta(body, {
+        claudePromptCache: claudeCacheMeta,
+      }),
+      responseBody: attachLogMeta(responseBody ?? undefined, {
+        claudePromptCache: claudeCacheMeta
+          ? {
+              applied: claudeCacheMeta.applied,
+              totalBreakpoints: claudeCacheMeta.totalBreakpoints,
+              anthropicBeta: claudeCacheMeta.anthropicBeta,
+            }
+          : null,
+        claudePromptCacheUsage: claudeCacheUsageMeta,
+      }),
+      error: error || null,
+      sourceFormat,
+      targetFormat,
+      comboName,
+      apiKeyId: apiKeyInfo?.id || null,
+      apiKeyName: apiKeyInfo?.name || null,
+      noLog: noLogEnabled,
+    }).catch(() => {});
+
+    if (!detailedLoggingEnabled) {
+      return;
+    }
+
+    try {
+      saveRequestDetailLog({
+        call_log_id: callLogId,
+        client_request: clientRawRequest?.body ?? body,
+        translated_request: providerRequest ?? null,
+        provider_response: providerResponse ?? null,
+        client_response: clientResponse ?? null,
+        provider,
+        model,
+        source_format: sourceFormat,
+        target_format: targetFormat,
+        duration_ms: Date.now() - startTime,
+        api_key_id: apiKeyInfo?.id || null,
+        no_log: noLogEnabled,
+      });
+    } catch (err) {
+      const errMessage = err instanceof Error ? err.message : String(err);
+      log?.debug?.("DETAIL_LOG", `Failed to save detailed log: ${errMessage}`);
+    }
+  };

  // Primary path: merge client model id + alias target so config on either key applies; resolved
  // id wins on same header name. T5 family fallback uses only (nextModel, resolveModelAlias(next))
@@ -919,40 +1010,34 @@ export async function handleChatCore({
    );
  } catch (error) {
    trackPendingRequest(model, provider, connectionId, false);
+    const failureStatus = error.name === "AbortError" ? 499 : HTTP_STATUS.BAD_GATEWAY;
+    const failureMessage =
+      error.name === "AbortError"
+        ? "Request aborted"
+        : formatProviderError(error, provider, model, HTTP_STATUS.BAD_GATEWAY);
    appendRequestLog({
      model,
      provider,
      connectionId,
-      status: `FAILED ${error.name === "AbortError" ? 499 : HTTP_STATUS.BAD_GATEWAY}`,
-    }).catch(() => {});
-    saveCallLog({
-      method: "POST",
-      path: clientRawRequest?.endpoint || "/v1/chat/completions",
-      status: error.name === "AbortError" ? 499 : HTTP_STATUS.BAD_GATEWAY,
-      model,
-      requestedModel,
-      provider,
-      connectionId,
-      duration: Date.now() - startTime,
-      requestBody: attachLogMeta(body, {
-        claudePromptCache: claudePromptCacheLogMeta,
-      }),
-      error: error.message,
-      sourceFormat,
-      targetFormat,
-      comboName,
-      apiKeyId: apiKeyInfo?.id || null,
-      apiKeyName: apiKeyInfo?.name || null,
-      noLog: apiKeyInfo?.noLog === true,
+      status: `FAILED ${failureStatus}`,
    }).catch(() => {});
+    persistAttemptLogs({
+      status: failureStatus,
+      error: failureMessage,
+      providerRequest: finalBody || translatedBody,
+      clientResponse: buildErrorBody(failureStatus, failureMessage),
+      claudeCacheMeta: claudePromptCacheLogMeta,
+    });
    if (error.name === "AbortError") {
      streamController.handleError(error);
      return createErrorResult(499, "Request aborted");
    }
-    persistFailureUsage(HTTP_STATUS.BAD_GATEWAY, error?.name || "upstream_error");
-    const errMsg = formatProviderError(error, provider, model, HTTP_STATUS.BAD_GATEWAY);
-    console.log(`${COLORS.red}[ERROR] ${errMsg}${COLORS.reset}`);
-    return createErrorResult(HTTP_STATUS.BAD_GATEWAY, errMsg);
+    persistFailureUsage(
+      HTTP_STATUS.BAD_GATEWAY,
+      error instanceof Error && error.name ? error.name : "upstream_error"
+    );
+    console.log(`${COLORS.red}[ERROR] ${failureMessage}${COLORS.reset}`);
+    return createErrorResult(HTTP_STATUS.BAD_GATEWAY, failureMessage);
  }

  // Handle 401/403 - try token refresh using executor
@@ -998,8 +1083,11 @@ export async function handleChatCore({
        if (retryResult.response.ok) {
          providerResponse = retryResult.response;
          providerUrl = retryResult.url;
+          providerHeaders = retryResult.headers;
+          finalBody = retryResult.transformedBody;
+          reqLogger.logTargetRequest(providerUrl, providerHeaders, finalBody);
        }
-      } catch (retryError) {
+      } catch {
        log?.warn?.("TOKEN", `${provider.toUpperCase()} | retry after refresh failed`);
      }
    } else {
@@ -1012,10 +1100,12 @@ export async function handleChatCore({
  // Check provider response - return error info for fallback handling
  if (!providerResponse.ok) {
    trackPendingRequest(model, provider, connectionId, false);
-    const { statusCode, message, retryAfterMs } = await parseUpstreamError(
-      providerResponse,
-      provider
-    );
+    const {
+      statusCode,
+      message,
+      retryAfterMs,
+      responseBody: upstreamErrorBody,
+    } = await parseUpstreamError(providerResponse, provider);

    // T06/T10/T36: classify provider errors and persist terminal account states.
    const errorType = classifyProviderError(statusCode, message);
@@ -1067,26 +1157,7 @@ export async function handleChatCore({
    appendRequestLog({ model, provider, connectionId, status: `FAILED ${statusCode}` }).catch(
      () => {}
    );
-    saveCallLog({
-      method: "POST",
-      path: clientRawRequest?.endpoint || "/v1/chat/completions",
-      status: statusCode,
-      model,
-      requestedModel,
-      provider,
-      connectionId,
-      duration: Date.now() - startTime,
-      requestBody: attachLogMeta(body, {
-        claudePromptCache: claudePromptCacheLogMeta,
-      }),
-      error: message,
-      sourceFormat,
-      targetFormat,
-      comboName,
-      apiKeyId: apiKeyInfo?.id || null,
-      apiKeyName: apiKeyInfo?.name || null,
-      noLog: apiKeyInfo?.noLog === true,
-    }).catch(() => {});
+
    const errMsg = formatProviderError(new Error(message), provider, model, statusCode);
    console.log(`${COLORS.red}[ERROR] ${errMsg}${COLORS.reset}`);

@@ -1098,6 +1169,12 @@ export async function handleChatCore({

    // Log error with full request body for debugging
    reqLogger.logError(new Error(message), finalBody || translatedBody);
+    reqLogger.logProviderResponse(
+      providerResponse.status,
+      providerResponse.statusText,
+      providerResponse.headers,
+      upstreamErrorBody
+    );

    // Update rate limiter from error response headers
    updateFromHeaders(provider, connectionId, providerResponse.headers, statusCode, model);
@@ -1121,24 +1198,53 @@ export async function handleChatCore({
            providerUrl = fallbackResult.url;
            providerHeaders = fallbackResult.headers;
            finalBody = fallbackResult.transformedBody;
+            reqLogger.logTargetRequest(providerUrl, providerHeaders, finalBody);
            // Continue processing with the fallback response — skip error return
            log?.info?.("MODEL_FALLBACK", `Serving ${nextModel} as fallback for ${model}`);
            // Jump to streaming/non-streaming handling below
            // We fall through by NOT returning here
          } else {
            // Fallback also failed — return original error
+            persistAttemptLogs({
+              status: statusCode,
+              error: errMsg,
+              providerRequest: finalBody || translatedBody,
+              providerResponse: upstreamErrorBody,
+              clientResponse: buildErrorBody(statusCode, errMsg),
+            });
            persistFailureUsage(statusCode, "model_unavailable");
            return createErrorResult(statusCode, errMsg, retryAfterMs);
          }
        } catch {
+          persistAttemptLogs({
+            status: statusCode,
+            error: errMsg,
+            providerRequest: finalBody || translatedBody,
+            providerResponse: upstreamErrorBody,
+            clientResponse: buildErrorBody(statusCode, errMsg),
+          });
          persistFailureUsage(statusCode, "model_unavailable");
          return createErrorResult(statusCode, errMsg, retryAfterMs);
        }
      } else {
+        persistAttemptLogs({
+          status: statusCode,
+          error: errMsg,
+          providerRequest: finalBody || translatedBody,
+          providerResponse: upstreamErrorBody,
+          clientResponse: buildErrorBody(statusCode, errMsg),
+        });
        persistFailureUsage(statusCode, "model_unavailable");
        return createErrorResult(statusCode, errMsg, retryAfterMs);
      }
    } else {
+      persistAttemptLogs({
+        status: statusCode,
+        error: errMsg,
+        providerRequest: finalBody || translatedBody,
+        providerResponse: upstreamErrorBody,
+        clientResponse: buildErrorBody(statusCode, errMsg),
+      });
      persistFailureUsage(statusCode, `upstream_${statusCode}`);
      return createErrorResult(statusCode, errMsg, retryAfterMs);
    }
@@ -1183,6 +1289,10 @@ export async function handleChatCore({
          });
          if (fbResult.response.ok) {
            providerResponse = fbResult.response;
+            providerUrl = fbResult.url;
+            providerHeaders = fbResult.headers;
+            finalBody = fbResult.transformedBody;
+            reqLogger.logTargetRequest(providerUrl, providerHeaders, finalBody);
            log?.info?.(
              "EMERGENCY_FALLBACK",
              `Serving ${fbDecision.provider}/${fbDecision.model} as budget fallback for ${provider}/${model}`
@@ -1195,7 +1305,8 @@ export async function handleChatCore({
            );
          }
        } catch (fbErr) {
-          log?.warn?.("EMERGENCY_FALLBACK", `Emergency fallback error: ${fbErr?.message}`);
+          const errMessage = fbErr instanceof Error ? fbErr.message : String(fbErr);
+          log?.warn?.("EMERGENCY_FALLBACK", `Emergency fallback error: ${errMessage}`);
        }
      }
    }
@@ -1208,6 +1319,7 @@ export async function handleChatCore({
    const contentType = (providerResponse.headers.get("content-type") || "").toLowerCase();
    let responseBody;
    const rawBody = await providerResponse.text();
+    const normalizedProviderPayload = normalizePayloadForLog(rawBody);
    const looksLikeSSE =
      contentType.includes("text/event-stream") || /(^|\n)\s*(event|data):/m.test(rawBody);

@@ -1225,11 +1337,16 @@ export async function handleChatCore({
          connectionId,
          status: `FAILED ${HTTP_STATUS.BAD_GATEWAY}`,
        }).catch(() => {});
+        const invalidSseMessage = "Invalid SSE response for non-streaming request";
+        persistAttemptLogs({
+          status: HTTP_STATUS.BAD_GATEWAY,
+          error: invalidSseMessage,
+          providerRequest: finalBody || translatedBody,
+          providerResponse: normalizedProviderPayload,
+          clientResponse: buildErrorBody(HTTP_STATUS.BAD_GATEWAY, invalidSseMessage),
+        });
        persistFailureUsage(HTTP_STATUS.BAD_GATEWAY, "invalid_sse_payload");
-        return createErrorResult(
-          HTTP_STATUS.BAD_GATEWAY,
-          "Invalid SSE response for non-streaming request"
-        );
+        return createErrorResult(HTTP_STATUS.BAD_GATEWAY, invalidSseMessage);
      }

      responseBody = parsedFromSSE;
@@ -1243,14 +1360,34 @@ export async function handleChatCore({
          connectionId,
          status: `FAILED ${HTTP_STATUS.BAD_GATEWAY}`,
        }).catch(() => {});
+        const invalidJsonMessage = "Invalid JSON response from provider";
+        persistAttemptLogs({
+          status: HTTP_STATUS.BAD_GATEWAY,
+          error: invalidJsonMessage,
+          providerRequest: finalBody || translatedBody,
+          providerResponse: normalizedProviderPayload,
+          clientResponse: buildErrorBody(HTTP_STATUS.BAD_GATEWAY, invalidJsonMessage),
+        });
        persistFailureUsage(HTTP_STATUS.BAD_GATEWAY, "invalid_json_payload");
-        return createErrorResult(HTTP_STATUS.BAD_GATEWAY, "Invalid JSON response from provider");
+        return createErrorResult(HTTP_STATUS.BAD_GATEWAY, invalidJsonMessage);
      }
    }

    if (sourceFormat === FORMATS.CLAUDE && targetFormat === FORMATS.CLAUDE) {
      responseBody = restoreClaudePassthroughToolNames(responseBody, toolNameMap);
    }
+    reqLogger.logProviderResponse(
+      providerResponse.status,
+      providerResponse.statusText,
+      providerResponse.headers,
+      looksLikeSSE
+        ? {
+            _streamed: true,
+            _format: "sse-json",
+            summary: responseBody,
+          }
+        : responseBody
+    );

    // Notify success - caller can clear error status if needed
    if (onRequestSuccess) {
@@ -1265,36 +1402,6 @@ export async function handleChatCore({

    // Save structured call log with full payloads
    const cacheUsageLogMeta = buildCacheUsageLogMeta(usage);
-    saveCallLog({
-      method: "POST",
-      path: clientRawRequest?.endpoint || "/v1/chat/completions",
-      status: 200,
-      model,
-      requestedModel,
-      provider,
-      connectionId,
-      duration: Date.now() - startTime,
-      tokens: usage,
-      requestBody: attachLogMeta(body, {
-        claudePromptCache: claudePromptCacheLogMeta,
-      }),
-      responseBody: attachLogMeta(responseBody, {
-        claudePromptCache: claudePromptCacheLogMeta
-          ? {
-              applied: claudePromptCacheLogMeta.applied,
-              totalBreakpoints: claudePromptCacheLogMeta.totalBreakpoints,
-              anthropicBeta: claudePromptCacheLogMeta.anthropicBeta,
-            }
-          : null,
-        claudePromptCacheUsage: cacheUsageLogMeta,
-      }),
-      sourceFormat,
-      targetFormat,
-      comboName,
-      apiKeyId: apiKeyInfo?.id || null,
-      apiKeyName: apiKeyInfo?.name || null,
-      noLog: apiKeyInfo?.noLog === true,
-    }).catch(() => {});
    if (usage && typeof usage === "object") {
      const msg = `[${new Date().toLocaleTimeString("en-US", { hour12: false, hour: "2-digit", minute: "2-digit" })}] 📊 [USAGE] ${provider.toUpperCase()} | in=${getLoggedInputTokens(usage)} | out=${getLoggedOutputTokens(usage)}${connectionId ? ` | account=${connectionId.slice(0, 8)}...` : ""}`;
      console.log(`${COLORS.green}${msg}${COLORS.reset}`);
@@ -1387,6 +1494,23 @@ export async function handleChatCore({

    // ── Phase 9.2: Save for idempotency ──
    saveIdempotency(idempotencyKey, translatedResponse, 200);
+    reqLogger.logConvertedResponse(translatedResponse);
+    persistAttemptLogs({
+      status: 200,
+      tokens: usage,
+      responseBody,
+      providerRequest: finalBody || translatedBody,
+      providerResponse: looksLikeSSE
+        ? {
+            _streamed: true,
+            _format: "sse-json",
+            summary: responseBody,
+          }
+        : responseBody,
+      clientResponse: translatedResponse,
+      claudeCacheMeta: claudePromptCacheLogMeta,
+      claudeCacheUsageMeta: cacheUsageLogMeta,
+    });

    return {
      success: true,
@@ -1422,38 +1546,20 @@ export async function handleChatCore({
    status: streamStatus,
    usage: streamUsage,
    responseBody: streamResponseBody,
+    providerPayload,
+    clientPayload,
  }) => {
    const cacheUsageLogMeta = buildCacheUsageLogMeta(streamUsage);
-    saveCallLog({
-      method: "POST",
-      path: clientRawRequest?.endpoint || "/v1/chat/completions",
+    persistAttemptLogs({
      status: streamStatus || 200,
-      model,
-      requestedModel,
-      provider,
-      connectionId,
-      duration: Date.now() - startTime,
      tokens: streamUsage || {},
-      requestBody: attachLogMeta(body, {
-        claudePromptCache: claudePromptCacheLogMeta,
-      }),
-      responseBody: attachLogMeta(streamResponseBody ?? undefined, {
-        claudePromptCache: claudePromptCacheLogMeta
-          ? {
-              applied: claudePromptCacheLogMeta.applied,
-              totalBreakpoints: claudePromptCacheLogMeta.totalBreakpoints,
-              anthropicBeta: claudePromptCacheLogMeta.anthropicBeta,
-            }
-          : null,
-        claudePromptCacheUsage: cacheUsageLogMeta,
-      }),
-      sourceFormat,
-      targetFormat,
-      comboName,
-      apiKeyId: apiKeyInfo?.id || null,
-      apiKeyName: apiKeyInfo?.name || null,
-      noLog: apiKeyInfo?.noLog === true,
-    }).catch(() => {});
+      responseBody: streamResponseBody ?? undefined,
+      providerRequest: finalBody || translatedBody,
+      providerResponse: providerPayload,
+      clientResponse: clientPayload ?? streamResponseBody ?? undefined,
+      claudeCacheMeta: claudePromptCacheLogMeta,
+      claudeCacheUsageMeta: cacheUsageLogMeta,
+    });

    if (apiKeyInfo?.id && streamUsage) {
      calculateCost(provider, model, streamUsage)
@@ -20,6 +20,15 @@ import { supportsToolCalling } from "./modelCapabilities.ts";

 // Status codes that should mark semaphore + record circuit breaker failures
 const TRANSIENT_FOR_BREAKER = [429, 502, 503, 504];
+const COMBO_BAD_REQUEST_FALLBACK_PATTERNS = [
+  /\bprohibited_content\b/i,
+  /request blocked by .*api/i,
+  /provided message roles? is not valid/i,
+  /unsupported .*message role/i,
+  /no such tool available/i,
+  /unsupported content part type/i,
+  /tool(?:_call|_use)? .* not (?:available|found)/i,
+];

 const MAX_COMBO_DEPTH = 3;

@@ -258,6 +267,12 @@ function extractPromptForIntent(body) {
  return "";
 }

+export function shouldFallbackComboBadRequest(status, errorText) {
+  if (status !== 400 || !errorText) return false;
+  const message = String(errorText);
+  return COMBO_BAD_REQUEST_FALLBACK_PATTERNS.some((pattern) => pattern.test(message));
+}
+
 function mapIntentToTaskType(intent) {
  switch (intent) {
    case "code":
@@ -449,14 +464,23 @@ export async function handleComboChat({
        const res = await handleSingleModel(b, modelStr);
        if (!res.ok) return res;

-        // Non-streaming: inject tag into JSON response (existing logic)
+        // Non-streaming: inject tag into JSON response
+        // Fix #721: Use OpenAI choices format (json.choices[0].message) not json.messages
        if (!b.stream) {
          try {
            const json = await res.clone().json();
-            const msgs = Array.isArray(json?.messages) ? json.messages : [];
-            if (msgs.length > 0) {
-              const tagged = injectModelTag(msgs, modelStr);
-              return new Response(JSON.stringify({ ...json, messages: tagged }), {
+            const choice = json?.choices?.[0];
+            if (choice?.message) {
+              // Wrap single message in array for injectModelTag, then unwrap
+              const tagged = injectModelTag([choice.message], modelStr);
+              // If the message had tool_calls but no string content, injectModelTag
+              // appends a synthetic assistant message — use the last one
+              const taggedMsg = tagged[tagged.length - 1];
+              const updatedJson = {
+                ...json,
+                choices: [{ ...choice, message: taggedMsg }, ...(json.choices?.slice(1) || [])],
+              };
+              return new Response(JSON.stringify(updatedJson), {
                status: res.status,
                headers: res.headers,
              });
@@ -487,8 +511,9 @@ export async function handleComboChat({

            const text = decoder.decode(chunk, { stream: true });

-            // Look for the first SSE data line with non-empty content
-            // Pattern: "content":"<non-empty>" — we inject tag at the start
+            // Fix #721: Look for either non-empty content OR tool_calls in the
+            // SSE data. Tool-call-only responses have content:null, so we inject
+            // the tag when we see a finish_reason approaching, or on first content.
            const contentMatch = text.match(/"content":"([^"]+)/);
            if (contentMatch) {
              // Inject tag at the beginning of the first content value
@@ -501,6 +526,27 @@ export async function handleComboChat({
              return;
            }

+            // Fix #721: For tool-call-only streams, inject the tag when we see
+            // the finish_reason chunk (before it reaches the client SDK which
+            // would close the connection). This ensures the tag roundtrips
+            // through the conversation history even when there's no text content.
+            if (text.includes('"finish_reason"') && !text.includes('"finish_reason":null')) {
+              // Inject a content chunk with the tag just before this finish chunk
+              const tagChunk = `data: ${JSON.stringify({
+                choices: [
+                  {
+                    delta: { content: tagContent },
+                    index: 0,
+                    finish_reason: null,
+                  },
+                ],
+              })}\n\n`;
+              tagInjected = true;
+              controller.enqueue(encoder.encode(tagChunk));
+              controller.enqueue(chunk);
+              return;
+            }
+
            // No content yet — passthrough
            controller.enqueue(chunk);
          },
@@ -890,17 +936,25 @@ export async function handleComboChat({
        provider,
        result.headers
      );
+      const comboBadRequestFallback = shouldFallbackComboBadRequest(result.status, errorText);

      // Record failure in circuit breaker for transient errors
      if (TRANSIENT_FOR_BREAKER.includes(result.status)) {
        breaker._onFailure();
      }

-      if (!shouldFallback) {
+      if (!shouldFallback && !comboBadRequestFallback) {
        log.warn("COMBO", `Model ${modelStr} failed (no fallback)`, { status: result.status });
        return result;
      }

+      if (comboBadRequestFallback) {
+        log.info(
+          "COMBO",
+          `Treating provider-scoped 400 from ${modelStr} as model-local failure; trying next combo target`
+        );
+      }
+
      // Check if this is a transient error worth retrying on same model
      const isTransient = [408, 429, 500, 502, 503, 504].includes(result.status);
      if (retry < maxRetries && isTransient) {
@@ -1146,6 +1200,7 @@ async function handleRoundRobinCombo({
          provider,
          result.headers
        );
+        const comboBadRequestFallback = shouldFallbackComboBadRequest(result.status, errorText);

        // Transient errors → mark in semaphore AND record circuit breaker failure
        if (TRANSIENT_FOR_BREAKER.includes(result.status) && cooldownMs > 0) {
@@ -1157,11 +1212,18 @@ async function handleRoundRobinCombo({
          );
        }

-        if (!shouldFallback) {
+        if (!shouldFallback && !comboBadRequestFallback) {
          log.warn("COMBO-RR", `${modelStr} failed (no fallback)`, { status: result.status });
          return result;
        }

+        if (comboBadRequestFallback) {
+          log.info(
+            "COMBO-RR",
+            `Treating provider-scoped 400 from ${modelStr} as model-local failure; trying next model`
+          );
+        }
+
        // Transient error → retry same model
        const isTransient = [408, 429, 500, 502, 503, 504].includes(result.status);
        if (retry < maxRetries && isTransient) {
@@ -67,7 +67,17 @@ export function injectModelTag(messages: Message[], providerModel: string): Mess
  }

  const msg = cleaned[lastAssistantIdx];
-  if (typeof msg.content !== "string") return cleaned;
+  // Fix #721: Handle messages where content is not a string (tool_calls responses).
+  // In this case, append a synthetic assistant message with the tag so the pin
+  // roundtrips through the conversation history.
+  if (typeof msg.content !== "string") {
+    // If the message has tool_calls but no string content, append a new assistant
+    // message with the tag rather than silently failing.
+    return [
+      ...cleaned,
+      { role: "assistant", content: `\n<omniModel>${providerModel}</omniModel>` },
+    ];
+  }

  const tagged = [...cleaned];
  tagged[lastAssistantIdx] = {
@@ -100,13 +100,66 @@ function shouldDisplayGitHubQuota(quota: UsageQuota | null): quota is UsageQuota
  return quota.total > 0 || quota.remainingPercentage !== undefined;
 }

+// GLM (Z.AI) quota API config
+const GLM_QUOTA_URLS: Record<string, string> = {
+  international: "https://api.z.ai/api/monitor/usage/quota/limit",
+  china: "https://open.bigmodel.cn/api/monitor/usage/quota/limit",
+};
+
+async function getGlmUsage(apiKey: string, providerSpecificData?: Record<string, unknown>) {
+  const region = providerSpecificData?.apiRegion || "international";
+  const quotaUrl = GLM_QUOTA_URLS[region] || GLM_QUOTA_URLS.international;
+
+  const res = await fetch(quotaUrl, {
+    headers: {
+      Authorization: `Bearer ${apiKey}`,
+      Accept: "application/json",
+    },
+  });
+
+  if (!res.ok) {
+    if (res.status === 401) throw new Error("Invalid API key");
+    throw new Error(`GLM quota API error (${res.status})`);
+  }
+
+  const json = await res.json();
+  const data = toRecord(json.data);
+  const limits: unknown[] = Array.isArray(data.limits) ? data.limits : [];
+  const quotas: Record<string, UsageQuota> = {};
+
+  for (const limit of limits) {
+    const src = toRecord(limit);
+    if (src.type !== "TOKENS_LIMIT") continue;
+
+    const usedPercent = toNumber(src.percentage, 0);
+    const resetMs = toNumber(src.nextResetTime, 0);
+    const remaining = Math.max(0, 100 - usedPercent);
+
+    quotas["session"] = {
+      used: usedPercent,
+      total: 100,
+      remaining,
+      remainingPercentage: remaining,
+      resetAt: resetMs > 0 ? new Date(resetMs).toISOString() : null,
+      unlimited: false,
+    };
+  }
+
+  const levelRaw = typeof data.level === "string" ? data.level : "";
+  const plan = levelRaw
+    ? levelRaw.charAt(0).toUpperCase() + levelRaw.slice(1).toLowerCase()
+    : "Unknown";
+
+  return { plan, quotas };
+}
+
 /**
 * Get usage data for a provider connection
 * @param {Object} connection - Provider connection with accessToken
 * @returns {Promise<unknown>} Usage data with quotas
 */
 export async function getUsageForProvider(connection) {
-  const { provider, accessToken, providerSpecificData } = connection;
+  const { provider, accessToken, apiKey, providerSpecificData } = connection;

  switch (provider) {
    case "github":
@@ -127,6 +180,8 @@ export async function getUsageForProvider(connection) {
      return await getQwenUsage(accessToken, providerSpecificData);
    case "iflow":
      return await getIflowUsage(accessToken);
+    case "glm":
+      return await getGlmUsage(apiKey, providerSpecificData);
    default:
      return { message: `Usage API not implemented for ${provider}` };
  }
@@ -105,13 +105,14 @@ function markMessageCacheControl(msg, ttl) {
 }

 // Prepare request for Claude format endpoints
-// - Cleanup cache_control
+// - Cleanup cache_control (unless preserveCacheControl=true for passthrough)
 // - Filter empty messages
 // - Add thinking block for Anthropic endpoint (provider === "claude")
 // - Fix tool_use/tool_result ordering
-export function prepareClaudeRequest(body, provider = null) {
+export function prepareClaudeRequest(body, provider = null, preserveCacheControl = false) {
  // 1. System: remove all cache_control, add only to last block with ttl 1h
-  if (body.system && Array.isArray(body.system)) {
+  // In passthrough mode, preserve existing cache_control markers
+  if (body.system && Array.isArray(body.system) && !preserveCacheControl) {
    body.system = body.system.map((block, i) => {
      const { cache_control, ...rest } = block;
      if (i === body.system.length - 1) {
@@ -127,11 +128,12 @@ export function prepareClaudeRequest(body, provider = null) {
    let filtered = [];

    // Pass 1: remove cache_control + filter empty messages
+    // In passthrough mode, preserve existing cache_control markers
    for (let i = 0; i < len; i++) {
      const msg = body.messages[i];

-      // Remove cache_control from content blocks
-      if (Array.isArray(msg.content)) {
+      // Remove cache_control from content blocks (skip in passthrough mode)
+      if (Array.isArray(msg.content) && !preserveCacheControl) {
        for (const block of msg.content) {
          delete block.cache_control;
        }
@@ -177,14 +179,17 @@ export function prepareClaudeRequest(body, provider = null) {
    // Claude Code-style prompt caching:
    // - cache the second-to-last user turn for conversation reuse
    // - cache the last assistant turn so the next user turn can reuse it
-    const userMessageIndexes = filtered.reduce((indexes, msg, index) => {
-      if (msg?.role === "user") indexes.push(index);
-      return indexes;
-    }, []);
-    const secondToLastUserIndex =
-      userMessageIndexes.length >= 2 ? userMessageIndexes[userMessageIndexes.length - 2] : -1;
-    if (secondToLastUserIndex >= 0) {
-      markMessageCacheControl(filtered[secondToLastUserIndex]);
+    // Skip in passthrough mode to preserve client's cache_control markers
+    if (!preserveCacheControl) {
+      const userMessageIndexes = filtered.reduce((indexes, msg, index) => {
+        if (msg?.role === "user") indexes.push(index);
+        return indexes;
+      }, []);
+      const secondToLastUserIndex =
+        userMessageIndexes.length >= 2 ? userMessageIndexes[userMessageIndexes.length - 2] : -1;
+      if (secondToLastUserIndex >= 0) {
+        markMessageCacheControl(filtered[secondToLastUserIndex]);
+      }
    }

    // Pass 2 (reverse): add cache_control to last assistant + handle thinking for Anthropic
@@ -194,7 +199,8 @@ export function prepareClaudeRequest(body, provider = null) {

      if (msg.role === "assistant" && Array.isArray(ensureMessageContentArray(msg))) {
        // Add cache_control to last block of first (from end) assistant with content
-        if (!lastAssistantProcessed && markMessageCacheControl(msg)) {
+        // Skip in passthrough mode to preserve client's cache_control markers
+        if (!preserveCacheControl && !lastAssistantProcessed && markMessageCacheControl(msg)) {
          lastAssistantProcessed = true;
        }

@@ -227,7 +233,8 @@ export function prepareClaudeRequest(body, provider = null) {

  // 3. Tools: remove all cache_control, add only to last non-deferred tool with ttl 1h
  // Tools with defer_loading=true cannot have cache_control (API rejects it)
-  if (body.tools && Array.isArray(body.tools)) {
+  // In passthrough mode, preserve existing cache_control markers
+  if (body.tools && Array.isArray(body.tools) && !preserveCacheControl) {
    body.tools = body.tools.map((tool) => {
      const { cache_control, ...rest } = tool;
      return rest;
@@ -149,8 +149,10 @@ export function translateRequest(
  }

  // Final step: prepare request for Claude format endpoints
+  // In Claude passthrough mode (Claude → Claude), preserve cache_control markers
  if (targetFormat === FORMATS.CLAUDE) {
-    result = prepareClaudeRequest(result, provider);
+    const isClaudePassthrough = sourceFormat === FORMATS.CLAUDE;
+    result = prepareClaudeRequest(result, provider, isClaudePassthrough);
  }

  // Normalize openai-responses input shape for providers that require list input.
@@ -167,8 +167,10 @@ function openaiToGeminiBase(model, body, stream) {
            if (tc.type !== "function") continue;

            const args = tryParseJSON(tc.function?.arguments || "{}");
+            // Do NOT include thoughtSignature on functionCall parts — it is only valid
+            // on thinking/reasoning parts and causes HTTP 400 "invalid argument" from the
+            // Gemini API when present on a functionCall part (#725).
            parts.push({
-              thoughtSignature: DEFAULT_THINKING_GEMINI_SIGNATURE,
              functionCall: {
                id: tc.id,
                name: tc.function.name,
@@ -1,5 +1,6 @@
 import { getCorsOrigin } from "./cors.ts";
 import { ERROR_TYPES, DEFAULT_ERROR_MESSAGES } from "../config/constants.ts";
+import { normalizePayloadForLog } from "@/lib/logPayloads";

 /**
 * Build OpenAI-compatible error response body
@@ -91,14 +92,16 @@ export function parseAntigravityRetryTime(message) {
 * Parse upstream provider error response
 * @param {Response} response - Fetch response from provider
 * @param {string} provider - Provider name (for Antigravity-specific parsing)
- * @returns {Promise<{statusCode: number, message: string, retryAfterMs: number|null}>}
+ * @returns {Promise<{statusCode: number, message: string, retryAfterMs: number|null, responseBody: unknown}>}
 */
 export async function parseUpstreamError(response, provider = null) {
  let message = "";
  let retryAfterMs = null;
+  let responseBody = null;

  try {
    const text = await response.text();
+    responseBody = normalizePayloadForLog(text);

    // Try parse as JSON
    try {
@@ -109,6 +112,7 @@ export async function parseUpstreamError(response, provider = null) {
    }
  } catch {
    message = `Upstream error: ${response.status}`;
+    responseBody = { _rawText: message };
  }

  const messageStr = typeof message === "string" ? message : JSON.stringify(message);
@@ -122,6 +126,7 @@ export async function parseUpstreamError(response, provider = null) {
    statusCode: response.status,
    message: messageStr,
    retryAfterMs,
+    responseBody,
  };
 }

@@ -11,6 +11,7 @@ import {
  COLORS,
 } from "./usageTracking.ts";
 import { parseSSELine, hasValuableContent, fixInvalidId, formatSSE } from "./streamHelpers.ts";
+import { createStructuredSSECollector } from "./streamPayloadCollector.ts";
 import { STREAM_IDLE_TIMEOUT_MS, HTTP_STATUS } from "../config/constants.ts";
 import {
  sanitizeStreamingChunk,
@@ -32,6 +33,8 @@ type StreamCompletePayload = {
  usage: unknown;
  /** Minimal response body for call log (streaming: usage + note; non-streaming not used) */
  responseBody?: unknown;
+  providerPayload?: unknown;
+  clientPayload?: unknown;
 };

 type StreamOptions = {
@@ -158,6 +161,12 @@ export function createSSEStream(options: StreamOptions = {}) {

  // Guard against duplicate [DONE] events — ensures exactly one per stream
  let doneSent = false;
+  const providerPayloadCollector = createStructuredSSECollector({
+    stage: "provider_response",
+  });
+  const clientPayloadCollector = createStructuredSSECollector({
+    stage: "client_response",
+  });

  // Per-stream instances to avoid shared state with concurrent streams
  const decoder = new TextDecoder();
@@ -212,6 +221,17 @@ export function createSSEStream(options: StreamOptions = {}) {
          if (mode === STREAM_MODE.PASSTHROUGH) {
            let output;
            let injectedUsage = false;
+            let clientPayload: unknown = null;
+
+            if (trimmed.startsWith("data:")) {
+              const providerPayload = parseSSELine(trimmed);
+              if (providerPayload) {
+                providerPayloadCollector.push(providerPayload);
+                if ((providerPayload as { done?: unknown }).done === true) {
+                  clientPayloadCollector.push(providerPayload);
+                }
+              }
+            }

            if (trimmed.startsWith("data:") && trimmed.slice(5).trim() !== "[DONE]") {
              try {
@@ -380,6 +400,8 @@ export function createSSEStream(options: StreamOptions = {}) {
                    injectedUsage = true;
                  }
                }
+
+                clientPayload = parsed;
              } catch {}
            }

@@ -391,6 +413,10 @@ export function createSSEStream(options: StreamOptions = {}) {
              }
            }

+            if (clientPayload) {
+              clientPayloadCollector.push(clientPayload);
+            }
+
            reqLogger?.appendConvertedChunk?.(output);
            controller.enqueue(encoder.encode(output));
            continue;
@@ -401,10 +427,12 @@ export function createSSEStream(options: StreamOptions = {}) {

          const parsed = parseSSELine(trimmed);
          if (!parsed) continue;
+          providerPayloadCollector.push(parsed);

          if (parsed && parsed.done) {
            if (!doneSent) {
              doneSent = true;
+              clientPayloadCollector.push({ done: true });
              const output = "data: [DONE]\n\n";
              reqLogger?.appendConvertedChunk?.(output);
              controller.enqueue(encoder.encode(output));
@@ -524,6 +552,7 @@ export function createSSEStream(options: StreamOptions = {}) {
              }

              const output = formatSSE(item, sourceFormat);
+              clientPayloadCollector.push(item);
              reqLogger?.appendConvertedChunk?.(output);
              controller.enqueue(encoder.encode(output));
            }
@@ -551,6 +580,11 @@ export function createSSEStream(options: StreamOptions = {}) {
              if (buffer.startsWith("data:") && !buffer.startsWith("data: ")) {
                output = "data: " + buffer.slice(5);
              }
+              const bufferedPayload = parseSSELine(buffer.trim());
+              if (bufferedPayload) {
+                providerPayloadCollector.push(bufferedPayload);
+                clientPayloadCollector.push(bufferedPayload);
+              }
              reqLogger?.appendConvertedChunk?.(output);
              controller.enqueue(encoder.encode(output));
            }
@@ -601,7 +635,13 @@ export function createSSEStream(options: StreamOptions = {}) {
                  },
                  _streamed: true,
                };
-                onComplete({ status: 200, usage, responseBody });
+                onComplete({
+                  status: 200,
+                  usage,
+                  responseBody,
+                  providerPayload: providerPayloadCollector.build(),
+                  clientPayload: clientPayloadCollector.build(responseBody),
+                });
              } catch {}
            }
            return;
@@ -611,6 +651,7 @@ export function createSSEStream(options: StreamOptions = {}) {
          if (buffer.trim()) {
            const parsed = parseSSELine(buffer.trim());
            if (parsed && !parsed.done) {
+              providerPayloadCollector.push(parsed);
              // Extract usage from remaining buffer — if the usage-bearing event
              // (e.g. response.completed) is the last SSE line, it ends up here
              // in the flush handler where extractUsage was not called.
@@ -647,6 +688,7 @@ export function createSSEStream(options: StreamOptions = {}) {
              if (translated?.length > 0) {
                for (const item of translated) {
                  const output = formatSSE(item, sourceFormat);
+                  clientPayloadCollector.push(item);
                  reqLogger?.appendConvertedChunk?.(output);
                  controller.enqueue(encoder.encode(output));
                }
@@ -666,6 +708,7 @@ export function createSSEStream(options: StreamOptions = {}) {
          if (flushed?.length > 0) {
            for (const item of flushed) {
              const output = formatSSE(item, sourceFormat);
+              clientPayloadCollector.push(item);
              reqLogger?.appendConvertedChunk?.(output);
              controller.enqueue(encoder.encode(output));
            }
@@ -684,6 +727,7 @@ export function createSSEStream(options: StreamOptions = {}) {
          // Send [DONE] (only if not already sent during transform)
          if (!doneSent) {
            doneSent = true;
+            clientPayloadCollector.push({ done: true });
            const doneOutput = "data: [DONE]\n\n";
            reqLogger?.appendConvertedChunk?.(doneOutput);
            controller.enqueue(encoder.encode(doneOutput));
@@ -747,7 +791,13 @@ export function createSSEStream(options: StreamOptions = {}) {
                },
                _streamed: true,
              };
-              onComplete({ status: 200, usage: state?.usage, responseBody });
+              onComplete({
+                status: 200,
+                usage: state?.usage,
+                responseBody,
+                providerPayload: providerPayloadCollector.build(),
+                clientPayload: clientPayloadCollector.build(responseBody),
+              });
            } catch {}
          }
        } catch (error) {
@@ -0,0 +1,72 @@
+import { cloneLogPayload } from "@/lib/logPayloads";
+
+type StructuredSSEEvent = {
+  index: number;
+  event?: string;
+  data: unknown;
+};
+
+type CollectorOptions = {
+  maxEvents?: number;
+  maxBytes?: number;
+  stage?: string;
+};
+
+function getEventName(payload: unknown): string | undefined {
+  if (!payload || typeof payload !== "object" || Array.isArray(payload)) return undefined;
+
+  if (typeof (payload as { event?: unknown }).event === "string") {
+    return (payload as { event: string }).event;
+  }
+  if (typeof (payload as { type?: unknown }).type === "string") {
+    return (payload as { type: string }).type;
+  }
+  if ((payload as { done?: unknown }).done === true) {
+    return "[DONE]";
+  }
+  return undefined;
+}
+
+export function createStructuredSSECollector(options: CollectorOptions = {}) {
+  const { maxEvents = 200, maxBytes = 49152, stage } = options;
+  const events: StructuredSSEEvent[] = [];
+  let usedBytes = 0;
+  let droppedEvents = 0;
+
+  return {
+    push(payload: unknown, explicitEvent?: string) {
+      if (payload === null || payload === undefined) return;
+
+      const event: StructuredSSEEvent = {
+        index: events.length + droppedEvents,
+        data: cloneLogPayload(payload),
+      };
+
+      const eventName = explicitEvent || getEventName(payload);
+      if (eventName) {
+        event.event = eventName;
+      }
+
+      const serializedSize = JSON.stringify(event).length;
+      if (events.length >= maxEvents || usedBytes + serializedSize > maxBytes) {
+        droppedEvents += 1;
+        return;
+      }
+
+      usedBytes += serializedSize;
+      events.push(event);
+    },
+
+    build(summary?: unknown) {
+      return {
+        _streamed: true,
+        _format: "sse-json",
+        ...(stage ? { _stage: stage } : {}),
+        _eventCount: events.length + droppedEvents,
+        ...(droppedEvents > 0 ? { _truncated: true, _droppedEvents: droppedEvents } : {}),
+        events,
+        ...(summary === undefined ? {} : { summary: cloneLogPayload(summary) }),
+      };
+    },
+  };
+}
@@ -1,12 +1,12 @@
 {
  "name": "omniroute",
-  "version": "3.1.10",
+  "version": "3.2.2",
  "lockfileVersion": 3,
  "requires": true,
  "packages": {
    "": {
      "name": "omniroute",
-      "version": "3.1.10",
+      "version": "3.2.2",
      "hasInstallScript": true,
      "license": "MIT",
      "workspaces": [
@@ -6346,9 +6346,9 @@
      }
    },
    "node_modules/@typescript-eslint/typescript-estree/node_modules/brace-expansion": {
-      "version": "5.0.4",
-      "resolved": "https://registry.npmjs.org/brace-expansion/-/brace-expansion-5.0.4.tgz",
-      "integrity": "sha512-h+DEnpVvxmfVefa4jFbCf5HdH5YMDXRsmKflpf1pILZWRFlTbJpxeU55nJl4Smt5HQaGzg1o6RHFPJaOqnmBDg==",
+      "version": "5.0.5",
+      "resolved": "https://registry.npmjs.org/brace-expansion/-/brace-expansion-5.0.5.tgz",
+      "integrity": "sha512-VZznLgtwhn+Mact9tfiwx64fA9erHH/MCXEUfB/0bX/6Fz6ny5EGTXYltMocqg4xFAQZtnO3DHWWXi8RiuN7cQ==",
      "dev": true,
      "license": "MIT",
      "dependencies": {
@@ -7574,9 +7574,9 @@
      "license": "MIT"
    },
    "node_modules/brace-expansion": {
-      "version": "1.1.12",
-      "resolved": "https://registry.npmjs.org/brace-expansion/-/brace-expansion-1.1.12.tgz",
-      "integrity": "sha512-9T9UjW3r0UW5c1Q7GTwllptXwhvYmEzFhzMfZ9H7FQWt+uZePjZPjBP/W1ZEyZ1twGWom5/56TF4lPcqjnDHcg==",
+      "version": "1.1.13",
+      "resolved": "https://registry.npmjs.org/brace-expansion/-/brace-expansion-1.1.13.tgz",
+      "integrity": "sha512-9ZLprWS6EENmhEOpjCYW2c8VkmOvckIJZfkr7rBW6dObmfgJ/L1GpSYW5Hpo9lDz4D1+n0Ckz8rU7FwHDQiG/w==",
      "dev": true,
      "license": "MIT",
      "dependencies": {
@@ -15527,9 +15527,9 @@
      }
    },
    "node_modules/path-to-regexp": {
-      "version": "8.3.0",
-      "resolved": "https://registry.npmjs.org/path-to-regexp/-/path-to-regexp-8.3.0.tgz",
-      "integrity": "sha512-7jdwVIRtsP8MYpdXSwOS0YdD0Du+qOoF/AEPIt88PcCFrZCzx41oxku1jD88hZBwbNUIEfpqvuhjFaMAqMTWnA==",
+      "version": "8.4.0",
+      "resolved": "https://registry.npmjs.org/path-to-regexp/-/path-to-regexp-8.4.0.tgz",
+      "integrity": "sha512-PuseHIvAnz3bjrM2rGJtSgo1zjgxapTLZ7x2pjhzWwlp4SJQgK3f3iZIQwkpEnBaKz6seKBADpM4B4ySkuYypg==",
      "license": "MIT",
      "funding": {
        "type": "opencollective",
@@ -1,6 +1,6 @@
 {
  "name": "omniroute",
-  "version": "3.1.10",
+  "version": "3.2.2",
  "description": "Smart AI Router with auto fallback — route to FREE & cheap models, zero downtime. Works with Cursor, Cline, Claude Desktop, Codex, and any OpenAI-compatible tool.",
  "type": "module",
  "bin": {
@@ -161,6 +161,7 @@
    ]
  },
  "overrides": {
-    "dompurify": "^3.3.2"
+    "dompurify": "^3.3.2",
+    "path-to-regexp": "^8.4.0"
  }
 }
@@ -0,0 +1,340 @@
+"use client";
+
+import { useState, useEffect, useCallback } from "react";
+import { Card, Button, EmptyState } from "@/shared/components";
+import { useNotificationStore } from "@/store/notificationStore";
+import { useTranslations } from "next-intl";
+
+// ─── Types ───────────────────────────────────────────────────────────────────
+
+interface SemanticCacheStats {
+  memoryEntries: number;
+  dbEntries: number;
+  hits: number;
+  misses: number;
+  hitRate: string;
+  tokensSaved: number;
+}
+
+interface IdempotencyStats {
+  activeKeys: number;
+  windowMs: number;
+}
+
+interface CacheStats {
+  semanticCache: SemanticCacheStats;
+  idempotency: IdempotencyStats;
+}
+
+// ─── Sub-components ──────────────────────────────────────────────────────────
+
+function StatCard({
+  icon,
+  label,
+  value,
+  sub,
+  valueClass = "text-text",
+}: {
+  icon: string;
+  label: string;
+  value: string | number;
+  sub?: string;
+  valueClass?: string;
+}) {
+  return (
+    <div className="flex flex-col gap-1 p-4 rounded-xl bg-surface-raised border border-border/40">
+      <div className="flex items-center gap-1.5 text-text-muted text-xs">
+        <span className="material-symbols-outlined text-base leading-none" aria-hidden="true">
+          {icon}
+        </span>
+        {label}
+      </div>
+      <div className={`text-2xl font-semibold tabular-nums ${valueClass}`}>{value}</div>
+      {sub && <div className="text-xs text-text-muted">{sub}</div>}
+    </div>
+  );
+}
+
+function HitRateBar({ hitRate, label }: { hitRate: number; label: string }) {
+  const colorClass = hitRate >= 70 ? "bg-green-500" : hitRate >= 40 ? "bg-amber-400" : "bg-red-500";
+  const textClass =
+    hitRate >= 70 ? "text-green-500" : hitRate >= 40 ? "text-amber-400" : "text-red-500";
+
+  return (
+    <div
+      className="w-full"
+      role="progressbar"
+      aria-label={label}
+      aria-valuenow={hitRate}
+      aria-valuemin={0}
+      aria-valuemax={100}
+    >
+      <div className="flex justify-between text-xs mb-1.5">
+        <span className="text-text-muted">{label}</span>
+        <span className={`font-semibold tabular-nums ${textClass}`}>{hitRate.toFixed(1)}%</span>
+      </div>
+      <div className="w-full h-2 rounded-full bg-surface/50 overflow-hidden">
+        <div
+          className={`h-full rounded-full transition-all duration-500 ${colorClass}`}
+          style={{ width: `${Math.min(hitRate, 100)}%` }}
+        />
+      </div>
+    </div>
+  );
+}
+
+function InfoRow({ icon, children }: { icon: string; children: React.ReactNode }) {
+  return (
+    <div className="flex gap-2 text-sm text-text-muted">
+      <span
+        className="material-symbols-outlined text-base leading-5 text-blue-400 shrink-0"
+        aria-hidden="true"
+      >
+        {icon}
+      </span>
+      <span>{children}</span>
+    </div>
+  );
+}
+
+// ─── Page ────────────────────────────────────────────────────────────────────
+
+const REFRESH_INTERVAL_MS = 10_000;
+const REFRESH_INTERVAL_SECONDS = REFRESH_INTERVAL_MS / 1000;
+
+export default function CachePage() {
+  const t = useTranslations("cache");
+  const [stats, setStats] = useState<CacheStats | null>(null);
+  const [loading, setLoading] = useState(true);
+  const [clearing, setClearing] = useState(false);
+  const notify = useNotificationStore();
+
+  const fetchStats = useCallback(async () => {
+    try {
+      const res = await fetch("/api/cache");
+      if (res.ok) {
+        const data: CacheStats = await res.json();
+        setStats(data);
+      }
+    } catch (error) {
+      // Network error — keep stale stats rather than clearing the UI
+      console.error("[CachePage] Failed to fetch cache stats:", error);
+    } finally {
+      setLoading(false);
+    }
+  }, []);
+
+  useEffect(() => {
+    void fetchStats();
+    const id = setInterval(() => void fetchStats(), REFRESH_INTERVAL_MS);
+    return () => clearInterval(id);
+  }, [fetchStats]);
+
+  const handleClearAll = async () => {
+    setClearing(true);
+    try {
+      const res = await fetch("/api/cache", { method: "DELETE" });
+      if (res.ok) {
+        const data = await res.json();
+        notify.add({
+          type: "success",
+          message: t("clearSuccess", { count: data.expiredRemoved ?? 0 }),
+        });
+        await fetchStats();
+      } else {
+        notify.add({ type: "error", message: t("clearError") });
+      }
+    } catch (error) {
+      console.error("[CachePage] Failed to clear cache:", error);
+      notify.add({ type: "error", message: t("clearError") });
+    } finally {
+      setClearing(false);
+    }
+  };
+
+  const sc = stats?.semanticCache;
+  const idp = stats?.idempotency;
+  const hitRate = sc ? parseFloat(sc.hitRate) : 0;
+  const totalRequests = sc ? sc.hits + sc.misses : 0;
+
+  return (
+    <div className="flex flex-col gap-6">
+      {/* Header */}
+      <div className="flex items-start justify-between gap-4">
+        <div>
+          <h1 className="text-xl font-semibold">{t("title")}</h1>
+          <p className="text-sm text-text-muted mt-0.5">{t("description")}</p>
+        </div>
+        <div className="flex gap-2 shrink-0">
+          <Button
+            variant="secondary"
+            icon="refresh"
+            size="sm"
+            onClick={() => void fetchStats()}
+            disabled={loading}
+            aria-label={t("refresh")}
+          >
+            {t("refresh")}
+          </Button>
+          <Button
+            variant="danger"
+            icon="delete_sweep"
+            size="sm"
+            onClick={() => void handleClearAll()}
+            disabled={clearing || loading}
+            loading={clearing}
+            aria-label={t("clearAll")}
+          >
+            {t("clearAll")}
+          </Button>
+        </div>
+      </div>
+
+      {/* Loading skeleton */}
+      {loading && (
+        <div
+          className="grid grid-cols-2 md:grid-cols-4 gap-4"
+          aria-busy="true"
+          aria-label="Loading cache statistics"
+        >
+          {Array.from({ length: 4 }).map((_, i) => (
+            <div key={i} className="h-24 rounded-xl bg-surface-raised animate-pulse" />
+          ))}
+        </div>
+      )}
+
+      {/* Error / empty state */}
+      {!loading && !stats && (
+        <EmptyState
+          icon="cached"
+          title={t("unavailable")}
+          description={t("unavailableDesc")}
+          actionLabel={t("refresh")}
+          onAction={() => void fetchStats()}
+        />
+      )}
+
+      {/* Main content */}
+      {!loading && stats && (
+        <>
+          {/* Stats grid */}
+          <div className="grid grid-cols-2 md:grid-cols-4 gap-4">
+            <StatCard
+              icon="memory"
+              label={t("memoryEntries")}
+              value={sc?.memoryEntries ?? 0}
+              sub={t("memoryEntriesSub")}
+            />
+            <StatCard
+              icon="storage"
+              label={t("dbEntries")}
+              value={sc?.dbEntries ?? 0}
+              sub={t("dbEntriesSub")}
+            />
+            <StatCard
+              icon="trending_up"
+              label={t("cacheHits")}
+              value={sc?.hits ?? 0}
+              sub={t("cacheHitsSub", { total: totalRequests })}
+              valueClass="text-green-500"
+            />
+            <StatCard
+              icon="token"
+              label={t("tokensSaved")}
+              value={(sc?.tokensSaved ?? 0).toLocaleString()}
+              sub={t("tokensSavedSub")}
+              valueClass="text-blue-400"
+            />
+          </div>
+
+          {/* Hit rate + breakdown */}
+          <Card>
+            <div className="p-5 flex flex-col gap-4">
+              <div className="flex items-center justify-between">
+                <h2 className="font-medium text-sm">{t("performance")}</h2>
+                <span className="text-xs text-text-muted">
+                  {t("autoRefresh", { seconds: REFRESH_INTERVAL_SECONDS })}
+                </span>
+              </div>
+              <HitRateBar hitRate={hitRate} label={t("hitRate")} />
+              <div className="grid grid-cols-3 gap-4 pt-3 border-t border-border/30 text-center">
+                <div>
+                  <div className="text-lg font-semibold tabular-nums text-green-500">
+                    {sc?.hits ?? 0}
+                  </div>
+                  <div className="text-xs text-text-muted mt-0.5">{t("hits")}</div>
+                </div>
+                <div>
+                  <div className="text-lg font-semibold tabular-nums text-red-400">
+                    {sc?.misses ?? 0}
+                  </div>
+                  <div className="text-xs text-text-muted mt-0.5">{t("misses")}</div>
+                </div>
+                <div>
+                  <div className="text-lg font-semibold tabular-nums">{totalRequests}</div>
+                  <div className="text-xs text-text-muted mt-0.5">{t("total")}</div>
+                </div>
+              </div>
+            </div>
+          </Card>
+
+          {/* Cache behavior */}
+          <Card>
+            <div className="p-5 flex flex-col gap-3">
+              <h2 className="font-medium text-sm">{t("behavior")}</h2>
+              <div className="grid grid-cols-1 md:grid-cols-2 gap-3">
+                <InfoRow icon="info">{t("behaviorDeterministic")}</InfoRow>
+                <InfoRow icon="info">
+                  {t.rich("behaviorBypass", {
+                    header: () => (
+                      <code className="bg-surface px-1 py-0.5 rounded text-xs font-mono">
+                        X-OmniRoute-No-Cache: true
+                      </code>
+                    ),
+                  })}
+                </InfoRow>
+                <InfoRow icon="info">{t("behaviorTwoTier")}</InfoRow>
+                <InfoRow icon="info">
+                  {t.rich("behaviorTtl", {
+                    envVar: () => (
+                      <code className="bg-surface px-1 py-0.5 rounded text-xs font-mono">
+                        SEMANTIC_CACHE_TTL_MS
+                      </code>
+                    ),
+                  })}
+                </InfoRow>
+              </div>
+            </div>
+          </Card>
+
+          {/* Idempotency */}
+          <Card>
+            <div className="p-5 flex flex-col gap-3">
+              <div className="flex items-center gap-2">
+                <span
+                  className="material-symbols-outlined text-base text-text-muted"
+                  aria-hidden="true"
+                >
+                  fingerprint
+                </span>
+                <h2 className="font-medium text-sm">{t("idempotency")}</h2>
+              </div>
+              <div className="grid grid-cols-2 gap-4">
+                <div className="p-3 rounded-lg bg-surface/50">
+                  <div className="text-lg font-semibold tabular-nums">{idp?.activeKeys ?? 0}</div>
+                  <div className="text-xs text-text-muted mt-0.5">{t("activeDedupKeys")}</div>
+                </div>
+                <div className="p-3 rounded-lg bg-surface/50">
+                  <div className="text-lg font-semibold tabular-nums">
+                    {idp ? `${(idp.windowMs / 1000).toFixed(0)}s` : "—"}
+                  </div>
+                  <div className="text-xs text-text-muted mt-0.5">{t("dedupWindow")}</div>
+                </div>
+              </div>
+            </div>
+          </Card>
+        </>
+      )}
+    </div>
+  );
+}
@@ -1122,18 +1122,27 @@ function TestResultsView({ results }) {
      {results.results?.map((r, i) => (
        <div
          key={i}
+          title={r.error || undefined}
          className="flex items-center gap-2 text-xs px-2 py-1.5 rounded bg-black/[0.02] dark:bg-white/[0.02]"
        >
          <span
            className={`material-symbols-outlined text-[14px] ${
              r.status === "ok"
                ? "text-emerald-500"
-                : r.status === "skipped"
-                  ? "text-text-muted"
-                  : "text-red-500"
+                : r.status === "reachable"
+                  ? "text-amber-500"
+                  : r.status === "skipped"
+                    ? "text-text-muted"
+                    : "text-red-500"
            }`}
          >
-            {r.status === "ok" ? "check_circle" : r.status === "skipped" ? "skip_next" : "error"}
+            {r.status === "ok"
+              ? "check_circle"
+              : r.status === "reachable"
+                ? "network_check"
+                : r.status === "skipped"
+                  ? "skip_next"
+                  : "error"}
          </span>
          <code className="font-mono flex-1">{r.model}</code>
          {r.latencyMs !== undefined && <span className="text-text-muted">{r.latencyMs}ms</span>}
@@ -1141,9 +1150,11 @@ function TestResultsView({ results }) {
            className={`text-[10px] uppercase font-medium ${
              r.status === "ok"
                ? "text-emerald-500"
-                : r.status === "skipped"
-                  ? "text-text-muted"
-                  : "text-red-500"
+                : r.status === "reachable"
+                  ? "text-amber-500"
+                  : r.status === "skipped"
+                    ? "text-text-muted"
+                    : "text-red-500"
            }`}
          >
            {r.status}
@@ -4289,6 +4289,7 @@ function AddApiKeyModal({
  const defaultBailianUrl = "https://coding-intl.dashscope.aliyuncs.com/apps/anthropic/v1";
  const isVertex = provider === "vertex";
  const defaultRegion = "us-central1";
+  const isGlm = provider === "glm";

  const [formData, setFormData] = useState({
    name: "",
@@ -4296,6 +4297,7 @@ function AddApiKeyModal({
    priority: 1,
    baseUrl: isBailian ? defaultBailianUrl : "",
    region: isVertex ? defaultRegion : "",
+    apiRegion: "international",
    validationModelId: "",
  });
  const [validating, setValidating] = useState(false);
@@ -4385,6 +4387,10 @@ function AddApiKeyModal({
        payload.providerSpecificData = {
          region: formData.region,
        };
+      } else if (isGlm) {
+        payload.providerSpecificData = {
+          apiRegion: formData.apiRegion,
+        };
      }

      const error = await onSave(payload);
@@ -4484,6 +4490,22 @@ function AddApiKeyModal({
            hint="ex: us-central1 ou europe-west4. Partner models usam a região global automaticamente."
          />
        )}
+        {isGlm && (
+          <div>
+            <label className="text-sm font-medium text-text-main mb-1 block">API Region</label>
+            <select
+              value={formData.apiRegion}
+              onChange={(e) => setFormData({ ...formData, apiRegion: e.target.value })}
+              className="w-full px-3 py-2 text-sm border border-border rounded-lg bg-background focus:outline-none focus:border-primary"
+            >
+              <option value="international">International (api.z.ai)</option>
+              <option value="china">China Mainland (open.bigmodel.cn)</option>
+            </select>
+            <p className="text-xs text-text-muted mt-1">
+              Select the endpoint region for API access and quota tracking.
+            </p>
+          </div>
+        )}
        <div className="flex gap-2">
          <Button
            onClick={handleSubmit}
@@ -4533,6 +4555,7 @@ function EditConnectionModal({ isOpen, connection, onSave, onClose }: EditConnec
    healthCheckInterval: 60,
    baseUrl: "",
    region: "",
+    apiRegion: "international",
    validationModelId: "",
    tag: "",
  });
@@ -4548,6 +4571,7 @@ function EditConnectionModal({ isOpen, connection, onSave, onClose }: EditConnec
  const isBailian = connection?.provider === "bailian-coding-plan";
  const defaultBailianUrl = "https://coding-intl.dashscope.aliyuncs.com/apps/anthropic/v1";
  const isVertex = connection?.provider === "vertex";
+  const isGlm = connection?.provider === "glm";
  const defaultRegion = "us-central1";

  useEffect(() => {
@@ -4563,6 +4587,7 @@ function EditConnectionModal({ isOpen, connection, onSave, onClose }: EditConnec
        healthCheckInterval: connection.healthCheckInterval ?? 60,
        baseUrl: existingBaseUrl || (isBailian ? defaultBailianUrl : ""),
        region: existingRegion || (isVertex ? defaultRegion : ""),
+        apiRegion: (connection.providerSpecificData?.apiRegion as string) || "international",
        validationModelId: (connection.providerSpecificData?.validationModelId as string) || "",
        tag: (connection.providerSpecificData?.tag as string) || "",
      });
@@ -4698,6 +4723,8 @@ function EditConnectionModal({ isOpen, connection, onSave, onClose }: EditConnec
          updates.providerSpecificData.baseUrl = validatedBailianBaseUrl;
        } else if (isVertex) {
          updates.providerSpecificData.region = formData.region;
+        } else if (isGlm) {
+          updates.providerSpecificData.apiRegion = formData.apiRegion;
        }
      } else {
        // Also persist tag for OAuth accounts
@@ -4832,6 +4859,23 @@ function EditConnectionModal({ isOpen, connection, onSave, onClose }: EditConnec
          />
        )}

+        {isGlm && (
+          <div>
+            <label className="text-sm font-medium text-text-main mb-1 block">API Region</label>
+            <select
+              value={formData.apiRegion}
+              onChange={(e) => setFormData({ ...formData, apiRegion: e.target.value })}
+              className="w-full px-3 py-2 text-sm border border-border rounded-lg bg-background focus:outline-none focus:border-primary"
+            >
+              <option value="international">International (api.z.ai)</option>
+              <option value="china">China Mainland (open.bigmodel.cn)</option>
+            </select>
+            <p className="text-xs text-text-muted mt-1">
+              Select the endpoint region for API access and quota tracking.
+            </p>
+          </div>
+        )}
+
        {/* T07: Extra API Keys for round-robin rotation */}
        {!isOAuth && (
          <div className="flex flex-col gap-2">
@@ -4,7 +4,7 @@ import { useTranslations } from "next-intl";

 import { useState, useEffect, useCallback, useMemo, useRef } from "react";
 import Image from "next/image";
-import { parseQuotaData, calculatePercentage, normalizePlanTier } from "./utils";
+import { parseQuotaData, calculatePercentage, normalizePlanTier, resolvePlanValue } from "./utils";
 import Card from "@/shared/components/Card";
 import Badge from "@/shared/components/Badge";
 import { CardSkeleton } from "@/shared/components/Loading";
@@ -16,6 +16,8 @@ const LS_EXPANDED_GROUPS = "omniroute:limits:expandedGroups";

 const REFRESH_INTERVAL_MS = 120000;
 const MIN_FETCH_INTERVAL_MS = 30000; // Debounce per-connection fetches
+const QUOTA_BAR_GREEN_THRESHOLD = 50;
+const QUOTA_BAR_YELLOW_THRESHOLD = 20;

 // Provider display config
 const PROVIDER_CONFIG = {
@@ -24,7 +26,7 @@ const PROVIDER_CONFIG = {
  kiro: { label: "Kiro AI", color: "#FF6B35" },
  codex: { label: "OpenAI Codex", color: "#10A37F" },
  claude: { label: "Claude Code", color: "#D97757" },
-  glm: { label: "GLM (Z.AI)", color: "#4A90D9" },
+  glm: { label: "GLM Coding", color: "#4A90D9" },
  "kimi-coding": { label: "Kimi Coding", color: "#1E3A8A" },
 };

@@ -64,9 +66,13 @@ function getShortModelName(name) {
 }

 // Get bar color based on remaining percentage
-function getBarColor(remaining) {
-  if (remaining > 70) return { bar: "#22c55e", text: "#22c55e", bg: "rgba(34,197,94,0.12)" };
-  if (remaining >= 30) return { bar: "#eab308", text: "#eab308", bg: "rgba(234,179,8,0.12)" };
+function getBarColor(remainingPercentage) {
+  if (remainingPercentage > QUOTA_BAR_GREEN_THRESHOLD) {
+    return { bar: "#22c55e", text: "#22c55e", bg: "rgba(34,197,94,0.12)" };
+  }
+  if (remainingPercentage > QUOTA_BAR_YELLOW_THRESHOLD) {
+    return { bar: "#eab308", text: "#eab308", bg: "rgba(234,179,8,0.12)" };
+  }
  return { bar: "#ef4444", text: "#ef4444", bg: "rgba(239,68,68,0.12)" };
 }

@@ -297,14 +303,22 @@ export default function ProviderLimits() {
    );
  }, [filteredConnections]);

-  const tierByConnection = useMemo(() => {
+  const resolvedPlanByConnection = useMemo(() => {
    const out = {};
    for (const conn of sortedConnections) {
-      out[conn.id] = normalizePlanTier(quotaData[conn.id]?.plan);
+      out[conn.id] = resolvePlanValue(quotaData[conn.id]?.plan, conn.providerSpecificData);
    }
    return out;
  }, [sortedConnections, quotaData]);

+  const tierByConnection = useMemo(() => {
+    const out = {};
+    for (const conn of sortedConnections) {
+      out[conn.id] = normalizePlanTier(resolvedPlanByConnection[conn.id]);
+    }
+    return out;
+  }, [sortedConnections, resolvedPlanByConnection]);
+
  const tierCounts = useMemo(() => {
    const counts = {
      all: sortedConnections.length,
@@ -313,6 +327,7 @@ export default function ProviderLimits() {
      business: 0,
      ultra: 0,
      pro: 0,
+      plus: 0,
      free: 0,
      unknown: 0,
    };
@@ -533,6 +548,7 @@ export default function ProviderLimits() {
              color: "#666",
            };
            const tierMeta = tierByConnection[conn.id] || normalizePlanTier(null);
+            const resolvedPlan = resolvedPlanByConnection[conn.id];

            return (
              <div
@@ -558,21 +574,29 @@ export default function ProviderLimits() {
                  </div>
                  <div className="min-w-0">
                    <div className="text-[13px] font-semibold text-text-main truncate">
-                      {conn.name || config.label}
+                      {conn.name || conn.displayName || conn.email || config.label}
                    </div>
-                    <div className="flex items-center gap-1.5 mt-0.5">
+                    <div className="flex items-center gap-1.5 mt-1 min-h-5">
                      <span
                        title={
-                          quota?.plan
-                            ? t("rawPlanWithValue", { plan: quota.plan })
+                          resolvedPlan
+                            ? t("rawPlanWithValue", { plan: resolvedPlan })
                            : t("noPlanFromProvider")
                        }
+                        className="inline-flex items-center shrink-0"
                      >
-                        <Badge variant={tierMeta.variant} size="sm" dot>
+                        <Badge
+                          variant={tierMeta.variant}
+                          size="sm"
+                          dot
+                          className="h-5 leading-none"
+                        >
                          {tierMeta.label}
                        </Badge>
                      </span>
-                      <span className="text-[11px] text-text-muted">{config.label}</span>
+                      <span className="text-[11px] leading-none text-text-muted">
+                        {config.label}
+                      </span>
                    </div>
                  </div>
                </div>
@@ -597,17 +621,19 @@ export default function ProviderLimits() {
                    <div className="text-xs text-text-muted italic">{quota.message}</div>
                  ) : quota?.quotas?.length > 0 ? (
                    quota.quotas.map((q, i) => {
-                      const remaining =
-                        q.remainingPercentage !== undefined
-                          ? Math.round(q.remainingPercentage)
-                          : calculatePercentage(q.used, q.total);
-                      const colors = getBarColor(remaining);
+                      const remainingPercentage = calculatePercentage(q.used, q.total);
+                      const colors = getBarColor(remainingPercentage);
                      const cd = formatCountdown(q.resetAt);
                      const shortName = getShortModelName(q.name);
                      const staleAfterReset = q.staleAfterReset === true;

                      return (
-                        <div key={i} className="flex items-center gap-1.5 min-w-[200px] shrink-0">
+                        <div
+                          key={i}
+                          className={`flex items-center gap-1.5 min-w-[200px] shrink-0 ${
+                            i > 0 ? "border-l border-border/80 pl-3 ml-1" : ""
+                          }`}
+                        >
                          {/* Model label */}
                          <span
                            className="text-[11px] font-semibold py-0.5 px-2 rounded whitespace-nowrap min-w-[60px] text-center"
@@ -632,7 +658,7 @@ export default function ProviderLimits() {
                            <div
                              className="h-full rounded-sm transition-[width] duration-300 ease-out"
                              style={{
-                                width: `${Math.min(remaining, 100)}%`,
+                                width: `${Math.min(remainingPercentage, 100)}%`,
                                background: colors.bar,
                              }}
                            />
@@ -643,7 +669,7 @@ export default function ProviderLimits() {
                            className="text-[11px] font-semibold min-w-[32px] text-right"
                            style={{ color: colors.text }}
                          >
-                            {remaining}%
+                            {remainingPercentage}%
                          </span>
                        </div>
                      );
@@ -1,6 +1,30 @@
 import { getModelsByProviderId } from "@omniroute/open-sse/config/providerModels.ts";
 import { safePercentage } from "@/shared/utils/formatting";

+const PROVIDER_PLAN_FALLBACKS = new Set([
+  "claude code",
+  "kimi coding",
+  "kiro",
+  "openai codex",
+  "codex",
+  "github copilot",
+]);
+
+function toRecord(value: unknown): Record<string, unknown> {
+  return value && typeof value === "object" && !Array.isArray(value)
+    ? (value as Record<string, unknown>)
+    : {};
+}
+
+function normalizePlanCandidate(value: unknown) {
+  if (typeof value !== "string") return null;
+  const trimmed = value.trim();
+  if (!trimmed) return null;
+  if (trimmed.toLowerCase() === "unknown") return null;
+  if (PROVIDER_PLAN_FALLBACKS.has(trimmed.toLowerCase())) return null;
+  return trimmed;
+}
+
 /**
 * Format ISO date string to countdown format (inspired by vscode-antigravity-cockpit)
 * @param {string|Date} date - ISO date string or Date object
@@ -180,7 +204,6 @@ export function parseQuotaData(provider, data) {
        break;

      default:
-        // Generic fallback for unknown providers
        if (data.quotas) {
          Object.entries(data.quotas).forEach(([name, quota]: [string, any]) => {
            normalizedQuotas.push(normalizeQuotaEntry(name, quota));
@@ -210,9 +233,32 @@ export function parseQuotaData(provider, data) {
  return normalizedQuotas;
 }

+/**
+ * Resolve the best available plan label using live usage first, then persisted
+ * provider-specific connection metadata.
+ */
+export function resolvePlanValue(plan, providerSpecificData) {
+  const psd = toRecord(providerSpecificData);
+  const candidates = [
+    plan,
+    psd.workspacePlanType,
+    psd.plan,
+    psd.subscription,
+    psd.tier,
+    psd.accountTier,
+  ];
+
+  for (const candidate of candidates) {
+    const normalized = normalizePlanCandidate(candidate);
+    if (normalized) return normalized;
+  }
+
+  return null;
+}
+
 /**
 * Normalize provider-specific plan labels into a shared tier taxonomy.
- * Supported tiers: enterprise, business, team, ultra, pro, free, unknown.
+ * Supported tiers: enterprise, business, team, ultra, pro, plus, free, unknown.
 */
 export function normalizePlanTier(plan) {
  const raw = typeof plan === "string" ? plan.trim() : "";
@@ -223,12 +269,12 @@ export function normalizePlanTier(plan) {
  const upper = raw.toUpperCase();

  // Provider names that are not real plan tiers — treat as unknown
-  if (upper === "CLAUDE CODE" || upper === "KIMI CODING" || upper === "KIRO") {
-    return { key: "unknown", label: raw, variant: "default", rank: 0, raw };
+  if (PROVIDER_PLAN_FALLBACKS.has(raw.toLowerCase())) {
+    return { key: "unknown", label: "Unknown", variant: "default", rank: 0, raw };
  }

  if (upper.includes("PRO+") || upper.includes("PRO PLUS") || upper.includes("PROPLUS")) {
-    return { key: "plus", label: "Pro+", variant: "secondary", rank: 4, raw };
+    return { key: "plus", label: "Pro+", variant: "success", rank: 4, raw };
  }

  if (upper.includes("ENTERPRISE") || upper.includes("CORP") || upper.includes("ORG")) {
@@ -245,7 +291,7 @@ export function normalizePlanTier(plan) {
  }

  if (upper.includes("STUDENT")) {
-    return { key: "pro", label: "Student", variant: "primary", rank: 3, raw };
+    return { key: "pro", label: "Student", variant: "success", rank: 3, raw };
  }

  if (upper.includes("ULTRA")) {
@@ -253,11 +299,11 @@ export function normalizePlanTier(plan) {
  }

  if (upper.includes("PRO") || upper.includes("PREMIUM")) {
-    return { key: "pro", label: "Pro", variant: "primary", rank: 3, raw };
+    return { key: "pro", label: "Pro", variant: "success", rank: 3, raw };
  }

  if (upper.includes("PLUS") || upper.includes("PAID")) {
-    return { key: "plus", label: "Plus", variant: "secondary", rank: 2, raw };
+    return { key: "plus", label: "Plus", variant: "success", rank: 2, raw };
  }

  if (
@@ -1,7 +1,18 @@
-import { NextResponse } from "next/server";
-import { getCacheStats, clearCache, cleanExpiredEntries } from "@/lib/semanticCache";
+import { NextRequest, NextResponse } from "next/server";
+import {
+  getCacheStats,
+  clearCache,
+  cleanExpiredEntries,
+  invalidateByModel,
+  invalidateBySignature,
+  invalidateStale,
+} from "@/lib/semanticCache";
 import { getIdempotencyStats } from "@/lib/idempotencyLayer";

+function errorMessage(error: unknown): string {
+  return error instanceof Error ? error.message : String(error);
+}
+
 /**
 * GET /api/cache — Cache statistics
 */
@@ -15,19 +26,67 @@ export async function GET() {
      idempotency: idempotencyStats,
    });
  } catch (error) {
-    return NextResponse.json({ error: error.message }, { status: 500 });
+    return NextResponse.json({ error: errorMessage(error) }, { status: 500 });
  }
 }

 /**
- * DELETE /api/cache — Clear all caches
+ * DELETE /api/cache — Clear all caches or targeted invalidation.
+ *
+ * Exactly one optional query parameter may be provided:
+ *   ?model=<name>      — invalidate all entries for a specific model
+ *   ?signature=<hex>   — invalidate a single entry by its SHA-256 signature
+ *   ?staleMs=<number>  — invalidate entries older than N milliseconds
+ *   (no params)        — clear all cache entries
+ *
+ * Providing more than one parameter returns 400 Bad Request.
 */
-export async function DELETE() {
+export async function DELETE(req: NextRequest) {
  try {
+    const { searchParams } = new URL(req.url);
+    const model = searchParams.get("model");
+    const signature = searchParams.get("signature");
+    const staleMsParam = searchParams.get("staleMs");
+
+    // Enforce mutual exclusivity — only one invalidation mode per request
+    const paramCount = [model, signature, staleMsParam].filter(Boolean).length;
+    if (paramCount > 1) {
+      return NextResponse.json(
+        {
+          error:
+            "Only one invalidation parameter (model, signature, or staleMs) may be provided per request.",
+        },
+        { status: 400 }
+      );
+    }
+
+    if (model) {
+      const removed = invalidateByModel(model);
+      return NextResponse.json({ ok: true, invalidated: removed, scope: "model", model });
+    }
+
+    if (signature) {
+      const removed = invalidateBySignature(signature);
+      return NextResponse.json({ ok: true, invalidated: removed ? 1 : 0, scope: "signature" });
+    }
+
+    if (staleMsParam) {
+      const maxAgeMs = parseInt(staleMsParam, 10);
+      if (Number.isNaN(maxAgeMs) || maxAgeMs <= 0) {
+        return NextResponse.json(
+          { error: "staleMs must be a positive integer (milliseconds)." },
+          { status: 400 }
+        );
+      }
+      const removed = invalidateStale(maxAgeMs);
+      return NextResponse.json({ ok: true, invalidated: removed, scope: "stale", maxAgeMs });
+    }
+
+    // Full clear
    clearCache();
-    const cleaned = cleanExpiredEntries();
-    return NextResponse.json({ ok: true, expiredRemoved: cleaned });
+    const expiredRemoved = cleanExpiredEntries();
+    return NextResponse.json({ ok: true, expiredRemoved, scope: "all" });
  } catch (error) {
-    return NextResponse.json({ error: error.message }, { status: 500 });
+    return NextResponse.json({ error: errorMessage(error) }, { status: 500 });
  }
 }
@@ -1,4 +1,9 @@
 import { NextResponse } from "next/server";
+import {
+  buildComboTestRequestBody,
+  probeComboModelReachability,
+  shouldProbeComboTestReachability,
+} from "@/lib/combos/testHealth";
 import { getComboByName } from "@/lib/localDb";
 import { testComboSchema } from "@/shared/validation/schemas";
 import { isValidationFailure, validateBody } from "@/shared/validation/helpers";
@@ -49,13 +54,9 @@ export async function POST(request) {
      const startTime = Date.now();
      try {
        // Send a minimal chat request to the internal SSE handler
-        // Use OpenAI-compatible format — universally accepted by all providers via the translator
-        const testBody = {
-          model: modelStr,
-          messages: [{ role: "user", content: "Hi" }],
-          max_tokens: 5,
-          stream: false,
-        };
+        // Use a tiny but realistic request body so gateway-routed models do not
+        // get flagged as dead just because the probe payload is too synthetic.
+        const testBody = buildComboTestRequestBody(modelStr);

        const internalUrl = `${getBaseUrl(request)}/v1/chat/completions`;
        const controller = new AbortController();
@@ -88,6 +89,29 @@ export async function POST(request) {
          } catch {
            errorMsg = res.statusText;
          }
+
+          let reachability = null;
+          if (shouldProbeComboTestReachability(res.status)) {
+            try {
+              reachability = await probeComboModelReachability(modelStr);
+            } catch {
+              reachability = null;
+            }
+          }
+
+          if (reachability?.reachable) {
+            results.push({
+              model: modelStr,
+              status: "reachable",
+              statusCode: res.status,
+              error: errorMsg,
+              latencyMs,
+              provider: reachability.provider,
+              probeMethod: reachability.method,
+            });
+            continue;
+          }
+
          results.push({
            model: modelStr,
            status: "error",
@@ -1,10 +1,9 @@
 /**
- * GET  /api/logs/detail         — List detailed request logs
- * GET  /api/logs/detail/:id     — Get specific detailed log
- * POST /api/logs/detail/toggle  — Enable/disable detailed logging
+ * GET  /api/logs/detail  — List detailed request logs + current enabled flag
+ * POST /api/logs/detail — Enable/disable detailed logging
 */
 import { NextRequest, NextResponse } from "next/server";
-import { isAuthenticated } from "@/shared/utils/apiAuth";
+import { requireManagementAuth } from "@/lib/api/requireManagementAuth";
 import {
  getRequestDetailLogs,
  getRequestDetailLogCount,
@@ -15,9 +14,8 @@ import { updateSettings } from "@/lib/db/settings";
 export const dynamic = "force-dynamic";

 export async function GET(req: NextRequest) {
-  if (!isAuthenticated(req)) {
-    return NextResponse.json({ error: "Unauthorized" }, { status: 401 });
-  }
+  const authError = await requireManagementAuth(req);
+  if (authError) return authError;

  const url = new URL(req.url);
  const limit = Math.min(Number(url.searchParams.get("limit") ?? 50), 200);
@@ -31,9 +29,8 @@ export async function GET(req: NextRequest) {
 }

 export async function POST(req: NextRequest) {
-  if (!isAuthenticated(req)) {
-    return NextResponse.json({ error: "Unauthorized" }, { status: 401 });
-  }
+  const authError = await requireManagementAuth(req);
+  if (authError) return authError;

  const body = await req.json();
  const enabled = body.enabled === true || body.enabled === "1";
@@ -3,6 +3,10 @@ import { getProviderConnectionById } from "@/models";
 import { replaceCustomModels } from "@/lib/db/models";
 import { saveCallLog } from "@/lib/usage/callLogs";
 import { isAuthenticated } from "@/shared/utils/apiAuth";
+import {
+  buildModelSyncInternalHeaders,
+  isModelSyncInternalRequest,
+} from "@/shared/services/modelSyncScheduler";

 /**
 * POST /api/providers/[id]/sync-models
@@ -19,7 +23,7 @@ export async function POST(request: Request, { params }: { params: Promise<{ id:
  const { id } = await params;

  try {
-    if (!(await isAuthenticated(request))) {
+    if (!(await isAuthenticated(request)) && !isModelSyncInternalRequest(request)) {
      return NextResponse.json(
        { error: { message: "Authentication required", type: "invalid_api_key" } },
        { status: 401 }
@@ -41,7 +45,7 @@ export async function POST(request: Request, { params }: { params: Promise<{ id:
      method: "GET",
      headers: {
        cookie: request.headers.get("cookie") || "",
-        "x-internal": "model-sync",
+        ...buildModelSyncInternalHeaders(),
      },
    });

@@ -126,11 +126,23 @@ export async function GET(
      return Response.json({ error: "Connection not found" }, { status: 404 });
    }

-    // Only OAuth connections have usage APIs
-    if (connection.authType !== "oauth") {
+    // Only OAuth connections and specific API key providers have usage APIs
+    const apikeyUsageProviders = ["glm"];
+    if (connection.authType !== "oauth" && !apikeyUsageProviders.includes(connection.provider)) {
      return Response.json({ message: "Usage not available for API key connections" });
    }

+    // API key providers skip OAuth refresh — call usage fetcher directly
+    if (connection.authType !== "oauth") {
+      try {
+        const usageData = await getUsageForProvider(connection);
+        return Response.json(usageData);
+      } catch (error) {
+        console.error("[Usage API] Error fetching usage:", error);
+        return Response.json({ error: (error as any).message }, { status: 500 });
+      }
+    }
+
    // Resolve proxy for this connection FIRST (key → combo → provider → global → direct)
    // so that both credential refresh AND usage fetch go through the proxy.
    const proxyInfo = await resolveProxyForConnection(connectionId);
@@ -1,8 +1,12 @@
 import { NextResponse } from "next/server";
+import { requireManagementAuth } from "@/lib/api/requireManagementAuth";
 import { getCallLogById } from "@/lib/usageDb";

 export async function GET(request, { params }) {
  try {
+    const authError = await requireManagementAuth(request);
+    if (authError) return authError;
+
    const { id } = await params;
    const log = await getCallLogById(id);

@@ -1,8 +1,12 @@
 import { NextResponse } from "next/server";
+import { requireManagementAuth } from "@/lib/api/requireManagementAuth";
 import { getCallLogs } from "@/lib/usageDb";

 export async function GET(request: Request) {
  try {
+    const authError = await requireManagementAuth(request);
+    if (authError) return authError;
+
    const { searchParams } = new URL(request.url);

    const filter: Record<string, any> = {};
@@ -1,5 +1,5 @@
 import { CORS_ORIGIN, CORS_HEADERS } from "@/shared/utils/cors";
-import { handleChat } from "@/sse/handlers/chat";
+import { buildClientRawRequest, handleChat } from "@/sse/handlers/chat";
 import { initTranslators } from "@omniroute/open-sse/translator/index.ts";
 import { createInjectionGuard } from "@/middleware/promptInjectionGuard";

@@ -75,7 +75,7 @@ export async function POST(request: Request) {
          headers: request.headers,
          body: JSON.stringify(normalized),
        });
-        return await handleChat(newRequest);
+        return await handleChat(newRequest, buildClientRawRequest(request, body));
      }
    }
  } catch (error) {
@@ -1,5 +1,5 @@
 import { CORS_ORIGIN } from "@/shared/utils/cors";
-import { handleChat } from "@/sse/handlers/chat";
+import { buildClientRawRequest, handleChat } from "@/sse/handlers/chat";
 import { initTranslators } from "@omniroute/open-sse/translator/index.ts";
 import { errorResponse } from "@omniroute/open-sse/utils/error.ts";
 import { HTTP_STATUS } from "@omniroute/open-sse/config/constants.ts";
@@ -91,5 +91,5 @@ export async function POST(request, { params }) {
    body: JSON.stringify(body),
  });

-  return await handleChat(newRequest);
+  return await handleChat(newRequest, buildClientRawRequest(request, rawBody));
 }
@@ -1,5 +1,5 @@
 import { CORS_ORIGIN } from "@/shared/utils/cors";
-import { handleChat } from "@/sse/handlers/chat";
+import { buildClientRawRequest, handleChat } from "@/sse/handlers/chat";
 import { initTranslators } from "@omniroute/open-sse/translator/index.ts";
 import { v1betaGeminiGenerateSchema } from "@/shared/validation/schemas";
 import { isValidationFailure, validateBody } from "@/shared/validation/helpers";
@@ -87,7 +87,7 @@ export async function POST(request, { params }) {
      body: JSON.stringify(convertedBody),
    });

-    return await handleChat(newRequest);
+    return await handleChat(newRequest, buildClientRawRequest(request, rawBody));
  } catch (error) {
    console.log("Error handling Gemini request:", error);
    return Response.json({ error: { message: error.message, code: 500 } }, { status: 500 });
@@ -185,7 +185,9 @@
    "agents": "وكلاء",
    "cliToolsShort": "أدوات",
    "autoCombo": "Auto Combo",
-    "searchTools": "Search Tools"
+    "searchTools": "Search Tools",
+    "cache": "Cache",
+    "cacheShort": "Cache"
  },
  "themesPage": {
    "title": "المواضيع",
@@ -2847,5 +2849,37 @@
      "userInitial": "انا بحاجة الى مساعدة مع",
      "userFollowUp": "هل يمكنك توضيح ذلك؟"
    }
+  },
+  "cache": {
+    "title": "Cache Management",
+    "description": "Monitor and manage semantic response cache, hit rates, and token savings.",
+    "refresh": "Refresh",
+    "clearAll": "Clear All",
+    "memoryEntries": "Memory Entries",
+    "dbEntries": "DB Entries",
+    "cacheHits": "Cache Hits",
+    "tokensSaved": "Tokens Saved",
+    "hitRate": "Hit Rate",
+    "performance": "Cache Performance",
+    "behavior": "Cache Behavior",
+    "idempotency": "Idempotency Layer",
+    "clearSuccess": "Cache cleared. {count} expired entries removed.",
+    "clearError": "Failed to clear cache.",
+    "unavailable": "Cache unavailable",
+    "unavailableDesc": "Could not fetch cache statistics. Make sure the server is running.",
+    "memoryEntriesSub": "In-memory LRU",
+    "dbEntriesSub": "Persisted (SQLite)",
+    "cacheHitsSub": "of {total} total",
+    "tokensSavedSub": "Estimated from hits",
+    "autoRefresh": "Auto-refreshes every {seconds}s",
+    "hits": "Hits",
+    "misses": "Misses",
+    "total": "Total",
+    "behaviorDeterministic": "Only non-streaming requests with temperature=0 are cached.",
+    "behaviorBypass": "Bypass with header {header}.",
+    "behaviorTwoTier": "Two-tier storage: in-memory LRU (fast) + SQLite (persistent across restarts).",
+    "behaviorTtl": "Default TTL: 30 minutes. Configure via {envVar}.",
+    "activeDedupKeys": "Active Dedup Keys",
+    "dedupWindow": "Dedup Window"
  }
 }
@@ -185,7 +185,9 @@
    "agents": "Агенти",
    "cliToolsShort": "Инструменти",
    "autoCombo": "Auto Combo",
-    "searchTools": "Search Tools"
+    "searchTools": "Search Tools",
+    "cache": "Cache",
+    "cacheShort": "Cache"
  },
  "themesPage": {
    "title": "Теми",
@@ -2847,5 +2849,37 @@
      "userInitial": "Имам нужда от помощ за",
      "userFollowUp": "Можете ли да разкажете по-подробно за това?"
    }
+  },
+  "cache": {
+    "title": "Cache Management",
+    "description": "Monitor and manage semantic response cache, hit rates, and token savings.",
+    "refresh": "Refresh",
+    "clearAll": "Clear All",
+    "memoryEntries": "Memory Entries",
+    "dbEntries": "DB Entries",
+    "cacheHits": "Cache Hits",
+    "tokensSaved": "Tokens Saved",
+    "hitRate": "Hit Rate",
+    "performance": "Cache Performance",
+    "behavior": "Cache Behavior",
+    "idempotency": "Idempotency Layer",
+    "clearSuccess": "Cache cleared. {count} expired entries removed.",
+    "clearError": "Failed to clear cache.",
+    "unavailable": "Cache unavailable",
+    "unavailableDesc": "Could not fetch cache statistics. Make sure the server is running.",
+    "memoryEntriesSub": "In-memory LRU",
+    "dbEntriesSub": "Persisted (SQLite)",
+    "cacheHitsSub": "of {total} total",
+    "tokensSavedSub": "Estimated from hits",
+    "autoRefresh": "Auto-refreshes every {seconds}s",
+    "hits": "Hits",
+    "misses": "Misses",
+    "total": "Total",
+    "behaviorDeterministic": "Only non-streaming requests with temperature=0 are cached.",
+    "behaviorBypass": "Bypass with header {header}.",
+    "behaviorTwoTier": "Two-tier storage: in-memory LRU (fast) + SQLite (persistent across restarts).",
+    "behaviorTtl": "Default TTL: 30 minutes. Configure via {envVar}.",
+    "activeDedupKeys": "Active Dedup Keys",
+    "dedupWindow": "Dedup Window"
  }
 }
@@ -185,7 +185,9 @@
    "themeViolet": "Fialová",
    "themeOrange": "Oranžová",
    "themeCyan": "Azurová",
-    "cliToolsShort": "Nástroje"
+    "cliToolsShort": "Nástroje",
+    "cache": "Cache",
+    "cacheShort": "Cache"
  },
  "themesPage": {
    "title": "Motivy",
@@ -2847,5 +2849,37 @@
      "userInitial": "Potřebuji pomoct",
      "userFollowUp": "Můžete to upřesnit?"
    }
+  },
+  "cache": {
+    "title": "Cache Management",
+    "description": "Monitor and manage semantic response cache, hit rates, and token savings.",
+    "refresh": "Refresh",
+    "clearAll": "Clear All",
+    "memoryEntries": "Memory Entries",
+    "dbEntries": "DB Entries",
+    "cacheHits": "Cache Hits",
+    "tokensSaved": "Tokens Saved",
+    "hitRate": "Hit Rate",
+    "performance": "Cache Performance",
+    "behavior": "Cache Behavior",
+    "idempotency": "Idempotency Layer",
+    "clearSuccess": "Cache cleared. {count} expired entries removed.",
+    "clearError": "Failed to clear cache.",
+    "unavailable": "Cache unavailable",
+    "unavailableDesc": "Could not fetch cache statistics. Make sure the server is running.",
+    "memoryEntriesSub": "In-memory LRU",
+    "dbEntriesSub": "Persisted (SQLite)",
+    "cacheHitsSub": "of {total} total",
+    "tokensSavedSub": "Estimated from hits",
+    "autoRefresh": "Auto-refreshes every {seconds}s",
+    "hits": "Hits",
+    "misses": "Misses",
+    "total": "Total",
+    "behaviorDeterministic": "Only non-streaming requests with temperature=0 are cached.",
+    "behaviorBypass": "Bypass with header {header}.",
+    "behaviorTwoTier": "Two-tier storage: in-memory LRU (fast) + SQLite (persistent across restarts).",
+    "behaviorTtl": "Default TTL: 30 minutes. Configure via {envVar}.",
+    "activeDedupKeys": "Active Dedup Keys",
+    "dedupWindow": "Dedup Window"
  }
 }
@@ -185,7 +185,9 @@
    "agents": "Agenter",
    "cliToolsShort": "Værktøjer",
    "autoCombo": "Auto Combo",
-    "searchTools": "Search Tools"
+    "searchTools": "Search Tools",
+    "cache": "Cache",
+    "cacheShort": "Cache"
  },
  "themesPage": {
    "title": "Temaer",
@@ -2847,5 +2849,37 @@
      "userInitial": "Jeg har brug for hjælp til",
      "userFollowUp": "Kan du uddybe det?"
    }
+  },
+  "cache": {
+    "title": "Cache Management",
+    "description": "Monitor and manage semantic response cache, hit rates, and token savings.",
+    "refresh": "Refresh",
+    "clearAll": "Clear All",
+    "memoryEntries": "Memory Entries",
+    "dbEntries": "DB Entries",
+    "cacheHits": "Cache Hits",
+    "tokensSaved": "Tokens Saved",
+    "hitRate": "Hit Rate",
+    "performance": "Cache Performance",
+    "behavior": "Cache Behavior",
+    "idempotency": "Idempotency Layer",
+    "clearSuccess": "Cache cleared. {count} expired entries removed.",
+    "clearError": "Failed to clear cache.",
+    "unavailable": "Cache unavailable",
+    "unavailableDesc": "Could not fetch cache statistics. Make sure the server is running.",
+    "memoryEntriesSub": "In-memory LRU",
+    "dbEntriesSub": "Persisted (SQLite)",
+    "cacheHitsSub": "of {total} total",
+    "tokensSavedSub": "Estimated from hits",
+    "autoRefresh": "Auto-refreshes every {seconds}s",
+    "hits": "Hits",
+    "misses": "Misses",
+    "total": "Total",
+    "behaviorDeterministic": "Only non-streaming requests with temperature=0 are cached.",
+    "behaviorBypass": "Bypass with header {header}.",
+    "behaviorTwoTier": "Two-tier storage: in-memory LRU (fast) + SQLite (persistent across restarts).",
+    "behaviorTtl": "Default TTL: 30 minutes. Configure via {envVar}.",
+    "activeDedupKeys": "Active Dedup Keys",
+    "dedupWindow": "Dedup Window"
  }
 }
@@ -185,7 +185,9 @@
    "agents": "Agenten",
    "cliToolsShort": "Werkzeuge",
    "autoCombo": "Auto Combo",
-    "searchTools": "Search Tools"
+    "searchTools": "Search Tools",
+    "cache": "Cache",
+    "cacheShort": "Cache"
  },
  "themesPage": {
    "title": "Themen",
@@ -2847,5 +2849,37 @@
      "userInitial": "Ich brauche Hilfe dabei",
      "userFollowUp": "Können Sie das näher erläutern?"
    }
+  },
+  "cache": {
+    "title": "Cache Management",
+    "description": "Monitor and manage semantic response cache, hit rates, and token savings.",
+    "refresh": "Refresh",
+    "clearAll": "Clear All",
+    "memoryEntries": "Memory Entries",
+    "dbEntries": "DB Entries",
+    "cacheHits": "Cache Hits",
+    "tokensSaved": "Tokens Saved",
+    "hitRate": "Hit Rate",
+    "performance": "Cache Performance",
+    "behavior": "Cache Behavior",
+    "idempotency": "Idempotency Layer",
+    "clearSuccess": "Cache cleared. {count} expired entries removed.",
+    "clearError": "Failed to clear cache.",
+    "unavailable": "Cache unavailable",
+    "unavailableDesc": "Could not fetch cache statistics. Make sure the server is running.",
+    "memoryEntriesSub": "In-memory LRU",
+    "dbEntriesSub": "Persisted (SQLite)",
+    "cacheHitsSub": "of {total} total",
+    "tokensSavedSub": "Estimated from hits",
+    "autoRefresh": "Auto-refreshes every {seconds}s",
+    "hits": "Hits",
+    "misses": "Misses",
+    "total": "Total",
+    "behaviorDeterministic": "Only non-streaming requests with temperature=0 are cached.",
+    "behaviorBypass": "Bypass with header {header}.",
+    "behaviorTwoTier": "Two-tier storage: in-memory LRU (fast) + SQLite (persistent across restarts).",
+    "behaviorTtl": "Default TTL: 30 minutes. Configure via {envVar}.",
+    "activeDedupKeys": "Active Dedup Keys",
+    "dedupWindow": "Dedup Window"
  }
 }
@@ -185,7 +185,9 @@
    "themeViolet": "Violet",
    "themeOrange": "Orange",
    "themeCyan": "Cyan",
-    "cliToolsShort": "Tools"
+    "cliToolsShort": "Tools",
+    "cache": "Cache",
+    "cacheShort": "Cache"
  },
  "themesPage": {
    "title": "Themes",
@@ -2849,5 +2851,37 @@
      "userInitial": "I need help with",
      "userFollowUp": "Can you elaborate on that?"
    }
+  },
+  "cache": {
+    "title": "Cache Management",
+    "description": "Monitor and manage semantic response cache, hit rates, and token savings.",
+    "refresh": "Refresh",
+    "clearAll": "Clear All",
+    "memoryEntries": "Memory Entries",
+    "memoryEntriesSub": "In-memory LRU",
+    "dbEntries": "DB Entries",
+    "dbEntriesSub": "Persisted (SQLite)",
+    "cacheHits": "Cache Hits",
+    "cacheHitsSub": "of {total} total",
+    "tokensSaved": "Tokens Saved",
+    "tokensSavedSub": "Estimated from hits",
+    "hitRate": "Hit Rate",
+    "performance": "Cache Performance",
+    "autoRefresh": "Auto-refreshes every {seconds}s",
+    "hits": "Hits",
+    "misses": "Misses",
+    "total": "Total",
+    "behavior": "Cache Behavior",
+    "behaviorDeterministic": "Only non-streaming requests with temperature=0 are cached.",
+    "behaviorBypass": "Bypass with header {header}.",
+    "behaviorTwoTier": "Two-tier storage: in-memory LRU (fast) + SQLite (persistent across restarts).",
+    "behaviorTtl": "Default TTL: 30 minutes. Configure via {envVar}.",
+    "idempotency": "Idempotency Layer",
+    "activeDedupKeys": "Active Dedup Keys",
+    "dedupWindow": "Dedup Window",
+    "clearSuccess": "Cache cleared. {count} expired entries removed.",
+    "clearError": "Failed to clear cache.",
+    "unavailable": "Cache unavailable",
+    "unavailableDesc": "Could not fetch cache statistics. Make sure the server is running."
  }
 }
@@ -185,7 +185,9 @@
    "agents": "Agentes",
    "cliToolsShort": "Herramientas",
    "autoCombo": "Auto Combo",
-    "searchTools": "Search Tools"
+    "searchTools": "Search Tools",
+    "cache": "Cache",
+    "cacheShort": "Cache"
  },
  "themesPage": {
    "title": "Temas",
@@ -2847,5 +2849,37 @@
      "userInitial": "necesito ayuda con",
      "userFollowUp": "¿Puedes dar más detalles sobre eso?"
    }
+  },
+  "cache": {
+    "title": "Cache Management",
+    "description": "Monitor and manage semantic response cache, hit rates, and token savings.",
+    "refresh": "Refresh",
+    "clearAll": "Clear All",
+    "memoryEntries": "Memory Entries",
+    "dbEntries": "DB Entries",
+    "cacheHits": "Cache Hits",
+    "tokensSaved": "Tokens Saved",
+    "hitRate": "Hit Rate",
+    "performance": "Cache Performance",
+    "behavior": "Cache Behavior",
+    "idempotency": "Idempotency Layer",
+    "clearSuccess": "Cache cleared. {count} expired entries removed.",
+    "clearError": "Failed to clear cache.",
+    "unavailable": "Cache unavailable",
+    "unavailableDesc": "Could not fetch cache statistics. Make sure the server is running.",
+    "memoryEntriesSub": "In-memory LRU",
+    "dbEntriesSub": "Persisted (SQLite)",
+    "cacheHitsSub": "of {total} total",
+    "tokensSavedSub": "Estimated from hits",
+    "autoRefresh": "Auto-refreshes every {seconds}s",
+    "hits": "Hits",
+    "misses": "Misses",
+    "total": "Total",
+    "behaviorDeterministic": "Only non-streaming requests with temperature=0 are cached.",
+    "behaviorBypass": "Bypass with header {header}.",
+    "behaviorTwoTier": "Two-tier storage: in-memory LRU (fast) + SQLite (persistent across restarts).",
+    "behaviorTtl": "Default TTL: 30 minutes. Configure via {envVar}.",
+    "activeDedupKeys": "Active Dedup Keys",
+    "dedupWindow": "Dedup Window"
  }
 }
@@ -185,7 +185,9 @@
    "agents": "Agentit",
    "cliToolsShort": "Työkalut",
    "autoCombo": "Auto Combo",
-    "searchTools": "Search Tools"
+    "searchTools": "Search Tools",
+    "cache": "Cache",
+    "cacheShort": "Cache"
  },
  "themesPage": {
    "title": "Teemat",
@@ -2847,5 +2849,37 @@
      "userInitial": "Tarvitsen apua",
      "userFollowUp": "Voitko tarkentaa sitä?"
    }
+  },
+  "cache": {
+    "title": "Cache Management",
+    "description": "Monitor and manage semantic response cache, hit rates, and token savings.",
+    "refresh": "Refresh",
+    "clearAll": "Clear All",
+    "memoryEntries": "Memory Entries",
+    "dbEntries": "DB Entries",
+    "cacheHits": "Cache Hits",
+    "tokensSaved": "Tokens Saved",
+    "hitRate": "Hit Rate",
+    "performance": "Cache Performance",
+    "behavior": "Cache Behavior",
+    "idempotency": "Idempotency Layer",
+    "clearSuccess": "Cache cleared. {count} expired entries removed.",
+    "clearError": "Failed to clear cache.",
+    "unavailable": "Cache unavailable",
+    "unavailableDesc": "Could not fetch cache statistics. Make sure the server is running.",
+    "memoryEntriesSub": "In-memory LRU",
+    "dbEntriesSub": "Persisted (SQLite)",
+    "cacheHitsSub": "of {total} total",
+    "tokensSavedSub": "Estimated from hits",
+    "autoRefresh": "Auto-refreshes every {seconds}s",
+    "hits": "Hits",
+    "misses": "Misses",
+    "total": "Total",
+    "behaviorDeterministic": "Only non-streaming requests with temperature=0 are cached.",
+    "behaviorBypass": "Bypass with header {header}.",
+    "behaviorTwoTier": "Two-tier storage: in-memory LRU (fast) + SQLite (persistent across restarts).",
+    "behaviorTtl": "Default TTL: 30 minutes. Configure via {envVar}.",
+    "activeDedupKeys": "Active Dedup Keys",
+    "dedupWindow": "Dedup Window"
  }
 }
@@ -185,7 +185,9 @@
    "agents": "Agents",
    "cliToolsShort": "Outils",
    "autoCombo": "Auto Combo",
-    "searchTools": "Search Tools"
+    "searchTools": "Search Tools",
+    "cache": "Cache",
+    "cacheShort": "Cache"
  },
  "themesPage": {
    "title": "Thèmes",
@@ -2847,5 +2849,37 @@
      "userInitial": "J'ai besoin d'aide pour",
      "userFollowUp": "Pouvez-vous développer cela ?"
    }
+  },
+  "cache": {
+    "title": "Cache Management",
+    "description": "Monitor and manage semantic response cache, hit rates, and token savings.",
+    "refresh": "Refresh",
+    "clearAll": "Clear All",
+    "memoryEntries": "Memory Entries",
+    "dbEntries": "DB Entries",
+    "cacheHits": "Cache Hits",
+    "tokensSaved": "Tokens Saved",
+    "hitRate": "Hit Rate",
+    "performance": "Cache Performance",
+    "behavior": "Cache Behavior",
+    "idempotency": "Idempotency Layer",
+    "clearSuccess": "Cache cleared. {count} expired entries removed.",
+    "clearError": "Failed to clear cache.",
+    "unavailable": "Cache unavailable",
+    "unavailableDesc": "Could not fetch cache statistics. Make sure the server is running.",
+    "memoryEntriesSub": "In-memory LRU",
+    "dbEntriesSub": "Persisted (SQLite)",
+    "cacheHitsSub": "of {total} total",
+    "tokensSavedSub": "Estimated from hits",
+    "autoRefresh": "Auto-refreshes every {seconds}s",
+    "hits": "Hits",
+    "misses": "Misses",
+    "total": "Total",
+    "behaviorDeterministic": "Only non-streaming requests with temperature=0 are cached.",
+    "behaviorBypass": "Bypass with header {header}.",
+    "behaviorTwoTier": "Two-tier storage: in-memory LRU (fast) + SQLite (persistent across restarts).",
+    "behaviorTtl": "Default TTL: 30 minutes. Configure via {envVar}.",
+    "activeDedupKeys": "Active Dedup Keys",
+    "dedupWindow": "Dedup Window"
  }
 }
@@ -185,7 +185,9 @@
    "agents": "סוכנים",
    "cliToolsShort": "כלים",
    "autoCombo": "Auto Combo",
-    "searchTools": "Search Tools"
+    "searchTools": "Search Tools",
+    "cache": "Cache",
+    "cacheShort": "Cache"
  },
  "themesPage": {
    "title": "Themes",
@@ -2847,5 +2849,37 @@
      "userInitial": "אני צריך עזרה עם",
      "userFollowUp": "אתה יכול לפרט על זה?"
    }
+  },
+  "cache": {
+    "title": "Cache Management",
+    "description": "Monitor and manage semantic response cache, hit rates, and token savings.",
+    "refresh": "Refresh",
+    "clearAll": "Clear All",
+    "memoryEntries": "Memory Entries",
+    "dbEntries": "DB Entries",
+    "cacheHits": "Cache Hits",
+    "tokensSaved": "Tokens Saved",
+    "hitRate": "Hit Rate",
+    "performance": "Cache Performance",
+    "behavior": "Cache Behavior",
+    "idempotency": "Idempotency Layer",
+    "clearSuccess": "Cache cleared. {count} expired entries removed.",
+    "clearError": "Failed to clear cache.",
+    "unavailable": "Cache unavailable",
+    "unavailableDesc": "Could not fetch cache statistics. Make sure the server is running.",
+    "memoryEntriesSub": "In-memory LRU",
+    "dbEntriesSub": "Persisted (SQLite)",
+    "cacheHitsSub": "of {total} total",
+    "tokensSavedSub": "Estimated from hits",
+    "autoRefresh": "Auto-refreshes every {seconds}s",
+    "hits": "Hits",
+    "misses": "Misses",
+    "total": "Total",
+    "behaviorDeterministic": "Only non-streaming requests with temperature=0 are cached.",
+    "behaviorBypass": "Bypass with header {header}.",
+    "behaviorTwoTier": "Two-tier storage: in-memory LRU (fast) + SQLite (persistent across restarts).",
+    "behaviorTtl": "Default TTL: 30 minutes. Configure via {envVar}.",
+    "activeDedupKeys": "Active Dedup Keys",
+    "dedupWindow": "Dedup Window"
  }
 }
@@ -107,7 +107,9 @@
    "agents": "एजेंट",
    "cliToolsShort": "उपकरण",
    "autoCombo": "Auto Combo",
-    "searchTools": "Search Tools"
+    "searchTools": "Search Tools",
+    "cache": "Cache",
+    "cacheShort": "Cache"
  },
  "themesPage": {
    "title": "Themes",
@@ -2689,5 +2691,37 @@
    "domainPlaceholder": "example.com",
    "requestTimedOut": "Request timed out ({seconds}s)",
    "networkError": "Network error"
+  },
+  "cache": {
+    "title": "Cache Management",
+    "description": "Monitor and manage semantic response cache, hit rates, and token savings.",
+    "refresh": "Refresh",
+    "clearAll": "Clear All",
+    "memoryEntries": "Memory Entries",
+    "dbEntries": "DB Entries",
+    "cacheHits": "Cache Hits",
+    "tokensSaved": "Tokens Saved",
+    "hitRate": "Hit Rate",
+    "performance": "Cache Performance",
+    "behavior": "Cache Behavior",
+    "idempotency": "Idempotency Layer",
+    "clearSuccess": "Cache cleared. {count} expired entries removed.",
+    "clearError": "Failed to clear cache.",
+    "unavailable": "Cache unavailable",
+    "unavailableDesc": "Could not fetch cache statistics. Make sure the server is running.",
+    "memoryEntriesSub": "In-memory LRU",
+    "dbEntriesSub": "Persisted (SQLite)",
+    "cacheHitsSub": "of {total} total",
+    "tokensSavedSub": "Estimated from hits",
+    "autoRefresh": "Auto-refreshes every {seconds}s",
+    "hits": "Hits",
+    "misses": "Misses",
+    "total": "Total",
+    "behaviorDeterministic": "Only non-streaming requests with temperature=0 are cached.",
+    "behaviorBypass": "Bypass with header {header}.",
+    "behaviorTwoTier": "Two-tier storage: in-memory LRU (fast) + SQLite (persistent across restarts).",
+    "behaviorTtl": "Default TTL: 30 minutes. Configure via {envVar}.",
+    "activeDedupKeys": "Active Dedup Keys",
+    "dedupWindow": "Dedup Window"
  }
 }
@@ -185,7 +185,9 @@
    "agents": "Ügynökök",
    "cliToolsShort": "Eszközök",
    "autoCombo": "Auto Combo",
-    "searchTools": "Search Tools"
+    "searchTools": "Search Tools",
+    "cache": "Cache",
+    "cacheShort": "Cache"
  },
  "themesPage": {
    "title": "Témák",
@@ -2847,5 +2849,37 @@
      "userInitial": "Segítségre van szükségem",
      "userFollowUp": "Kifejtenéd ezt bővebben?"
    }
+  },
+  "cache": {
+    "title": "Cache Management",
+    "description": "Monitor and manage semantic response cache, hit rates, and token savings.",
+    "refresh": "Refresh",
+    "clearAll": "Clear All",
+    "memoryEntries": "Memory Entries",
+    "dbEntries": "DB Entries",
+    "cacheHits": "Cache Hits",
+    "tokensSaved": "Tokens Saved",
+    "hitRate": "Hit Rate",
+    "performance": "Cache Performance",
+    "behavior": "Cache Behavior",
+    "idempotency": "Idempotency Layer",
+    "clearSuccess": "Cache cleared. {count} expired entries removed.",
+    "clearError": "Failed to clear cache.",
+    "unavailable": "Cache unavailable",
+    "unavailableDesc": "Could not fetch cache statistics. Make sure the server is running.",
+    "memoryEntriesSub": "In-memory LRU",
+    "dbEntriesSub": "Persisted (SQLite)",
+    "cacheHitsSub": "of {total} total",
+    "tokensSavedSub": "Estimated from hits",
+    "autoRefresh": "Auto-refreshes every {seconds}s",
+    "hits": "Hits",
+    "misses": "Misses",
+    "total": "Total",
+    "behaviorDeterministic": "Only non-streaming requests with temperature=0 are cached.",
+    "behaviorBypass": "Bypass with header {header}.",
+    "behaviorTwoTier": "Two-tier storage: in-memory LRU (fast) + SQLite (persistent across restarts).",
+    "behaviorTtl": "Default TTL: 30 minutes. Configure via {envVar}.",
+    "activeDedupKeys": "Active Dedup Keys",
+    "dedupWindow": "Dedup Window"
  }
 }
@@ -185,7 +185,9 @@
    "agents": "Agen",
    "cliToolsShort": "Alat",
    "autoCombo": "Auto Combo",
-    "searchTools": "Search Tools"
+    "searchTools": "Search Tools",
+    "cache": "Cache",
+    "cacheShort": "Cache"
  },
  "themesPage": {
    "title": "Themes",
@@ -2847,5 +2849,37 @@
      "userInitial": "Saya butuh bantuan",
      "userFollowUp": "Bisakah Anda menjelaskannya lebih lanjut?"
    }
+  },
+  "cache": {
+    "title": "Cache Management",
+    "description": "Monitor and manage semantic response cache, hit rates, and token savings.",
+    "refresh": "Refresh",
+    "clearAll": "Clear All",
+    "memoryEntries": "Memory Entries",
+    "dbEntries": "DB Entries",
+    "cacheHits": "Cache Hits",
+    "tokensSaved": "Tokens Saved",
+    "hitRate": "Hit Rate",
+    "performance": "Cache Performance",
+    "behavior": "Cache Behavior",
+    "idempotency": "Idempotency Layer",
+    "clearSuccess": "Cache cleared. {count} expired entries removed.",
+    "clearError": "Failed to clear cache.",
+    "unavailable": "Cache unavailable",
+    "unavailableDesc": "Could not fetch cache statistics. Make sure the server is running.",
+    "memoryEntriesSub": "In-memory LRU",
+    "dbEntriesSub": "Persisted (SQLite)",
+    "cacheHitsSub": "of {total} total",
+    "tokensSavedSub": "Estimated from hits",
+    "autoRefresh": "Auto-refreshes every {seconds}s",
+    "hits": "Hits",
+    "misses": "Misses",
+    "total": "Total",
+    "behaviorDeterministic": "Only non-streaming requests with temperature=0 are cached.",
+    "behaviorBypass": "Bypass with header {header}.",
+    "behaviorTwoTier": "Two-tier storage: in-memory LRU (fast) + SQLite (persistent across restarts).",
+    "behaviorTtl": "Default TTL: 30 minutes. Configure via {envVar}.",
+    "activeDedupKeys": "Active Dedup Keys",
+    "dedupWindow": "Dedup Window"
  }
 }
@@ -185,7 +185,9 @@
    "themeViolet": "बैंगनी",
    "themeOrange": "नारंगी",
    "themeCyan": "सियान",
-    "cliToolsShort": "उपकरण"
+    "cliToolsShort": "उपकरण",
+    "cache": "Cache",
+    "cacheShort": "Cache"
  },
  "themesPage": {
    "title": "थीम्स",
@@ -2843,5 +2845,37 @@
      "userInitial": "मुझे मदद चाहिए",
      "userFollowUp": "क्या आप इसके बारे में विस्तार से बता सकते हैं?"
    }
+  },
+  "cache": {
+    "title": "Cache Management",
+    "description": "Monitor and manage semantic response cache, hit rates, and token savings.",
+    "refresh": "Refresh",
+    "clearAll": "Clear All",
+    "memoryEntries": "Memory Entries",
+    "dbEntries": "DB Entries",
+    "cacheHits": "Cache Hits",
+    "tokensSaved": "Tokens Saved",
+    "hitRate": "Hit Rate",
+    "performance": "Cache Performance",
+    "behavior": "Cache Behavior",
+    "idempotency": "Idempotency Layer",
+    "clearSuccess": "Cache cleared. {count} expired entries removed.",
+    "clearError": "Failed to clear cache.",
+    "unavailable": "Cache unavailable",
+    "unavailableDesc": "Could not fetch cache statistics. Make sure the server is running.",
+    "memoryEntriesSub": "In-memory LRU",
+    "dbEntriesSub": "Persisted (SQLite)",
+    "cacheHitsSub": "of {total} total",
+    "tokensSavedSub": "Estimated from hits",
+    "autoRefresh": "Auto-refreshes every {seconds}s",
+    "hits": "Hits",
+    "misses": "Misses",
+    "total": "Total",
+    "behaviorDeterministic": "Only non-streaming requests with temperature=0 are cached.",
+    "behaviorBypass": "Bypass with header {header}.",
+    "behaviorTwoTier": "Two-tier storage: in-memory LRU (fast) + SQLite (persistent across restarts).",
+    "behaviorTtl": "Default TTL: 30 minutes. Configure via {envVar}.",
+    "activeDedupKeys": "Active Dedup Keys",
+    "dedupWindow": "Dedup Window"
  }
 }
@@ -185,7 +185,9 @@
    "agents": "Agenti",
    "cliToolsShort": "Strumenti",
    "autoCombo": "Auto Combo",
-    "searchTools": "Search Tools"
+    "searchTools": "Search Tools",
+    "cache": "Cache",
+    "cacheShort": "Cache"
  },
  "themesPage": {
    "title": "Themes",
@@ -2847,5 +2849,37 @@
      "userInitial": "Ho bisogno di aiuto con",
      "userFollowUp": "Puoi approfondire questo argomento?"
    }
+  },
+  "cache": {
+    "title": "Cache Management",
+    "description": "Monitor and manage semantic response cache, hit rates, and token savings.",
+    "refresh": "Refresh",
+    "clearAll": "Clear All",
+    "memoryEntries": "Memory Entries",
+    "dbEntries": "DB Entries",
+    "cacheHits": "Cache Hits",
+    "tokensSaved": "Tokens Saved",
+    "hitRate": "Hit Rate",
+    "performance": "Cache Performance",
+    "behavior": "Cache Behavior",
+    "idempotency": "Idempotency Layer",
+    "clearSuccess": "Cache cleared. {count} expired entries removed.",
+    "clearError": "Failed to clear cache.",
+    "unavailable": "Cache unavailable",
+    "unavailableDesc": "Could not fetch cache statistics. Make sure the server is running.",
+    "memoryEntriesSub": "In-memory LRU",
+    "dbEntriesSub": "Persisted (SQLite)",
+    "cacheHitsSub": "of {total} total",
+    "tokensSavedSub": "Estimated from hits",
+    "autoRefresh": "Auto-refreshes every {seconds}s",
+    "hits": "Hits",
+    "misses": "Misses",
+    "total": "Total",
+    "behaviorDeterministic": "Only non-streaming requests with temperature=0 are cached.",
+    "behaviorBypass": "Bypass with header {header}.",
+    "behaviorTwoTier": "Two-tier storage: in-memory LRU (fast) + SQLite (persistent across restarts).",
+    "behaviorTtl": "Default TTL: 30 minutes. Configure via {envVar}.",
+    "activeDedupKeys": "Active Dedup Keys",
+    "dedupWindow": "Dedup Window"
  }
 }
@@ -185,7 +185,9 @@
    "agents": "エージェント",
    "cliToolsShort": "ツール",
    "autoCombo": "Auto Combo",
-    "searchTools": "Search Tools"
+    "searchTools": "Search Tools",
+    "cache": "Cache",
+    "cacheShort": "Cache"
  },
  "themesPage": {
    "title": "Themes",
@@ -2847,5 +2849,37 @@
      "userInitial": "助けが必要です",
      "userFollowUp": "それについて詳しく教えてもらえますか？"
    }
+  },
+  "cache": {
+    "title": "Cache Management",
+    "description": "Monitor and manage semantic response cache, hit rates, and token savings.",
+    "refresh": "Refresh",
+    "clearAll": "Clear All",
+    "memoryEntries": "Memory Entries",
+    "dbEntries": "DB Entries",
+    "cacheHits": "Cache Hits",
+    "tokensSaved": "Tokens Saved",
+    "hitRate": "Hit Rate",
+    "performance": "Cache Performance",
+    "behavior": "Cache Behavior",
+    "idempotency": "Idempotency Layer",
+    "clearSuccess": "Cache cleared. {count} expired entries removed.",
+    "clearError": "Failed to clear cache.",
+    "unavailable": "Cache unavailable",
+    "unavailableDesc": "Could not fetch cache statistics. Make sure the server is running.",
+    "memoryEntriesSub": "In-memory LRU",
+    "dbEntriesSub": "Persisted (SQLite)",
+    "cacheHitsSub": "of {total} total",
+    "tokensSavedSub": "Estimated from hits",
+    "autoRefresh": "Auto-refreshes every {seconds}s",
+    "hits": "Hits",
+    "misses": "Misses",
+    "total": "Total",
+    "behaviorDeterministic": "Only non-streaming requests with temperature=0 are cached.",
+    "behaviorBypass": "Bypass with header {header}.",
+    "behaviorTwoTier": "Two-tier storage: in-memory LRU (fast) + SQLite (persistent across restarts).",
+    "behaviorTtl": "Default TTL: 30 minutes. Configure via {envVar}.",
+    "activeDedupKeys": "Active Dedup Keys",
+    "dedupWindow": "Dedup Window"
  }
 }
@@ -185,7 +185,9 @@
    "agents": "에이전트",
    "cliToolsShort": "도구",
    "autoCombo": "Auto Combo",
-    "searchTools": "Search Tools"
+    "searchTools": "Search Tools",
+    "cache": "Cache",
+    "cacheShort": "Cache"
  },
  "themesPage": {
    "title": "Themes",
@@ -2847,5 +2849,37 @@
      "userInitial": "도움이 필요해요",
      "userFollowUp": "그것에 대해 자세히 설명해주실 수 있나요?"
    }
+  },
+  "cache": {
+    "title": "Cache Management",
+    "description": "Monitor and manage semantic response cache, hit rates, and token savings.",
+    "refresh": "Refresh",
+    "clearAll": "Clear All",
+    "memoryEntries": "Memory Entries",
+    "dbEntries": "DB Entries",
+    "cacheHits": "Cache Hits",
+    "tokensSaved": "Tokens Saved",
+    "hitRate": "Hit Rate",
+    "performance": "Cache Performance",
+    "behavior": "Cache Behavior",
+    "idempotency": "Idempotency Layer",
+    "clearSuccess": "Cache cleared. {count} expired entries removed.",
+    "clearError": "Failed to clear cache.",
+    "unavailable": "Cache unavailable",
+    "unavailableDesc": "Could not fetch cache statistics. Make sure the server is running.",
+    "memoryEntriesSub": "In-memory LRU",
+    "dbEntriesSub": "Persisted (SQLite)",
+    "cacheHitsSub": "of {total} total",
+    "tokensSavedSub": "Estimated from hits",
+    "autoRefresh": "Auto-refreshes every {seconds}s",
+    "hits": "Hits",
+    "misses": "Misses",
+    "total": "Total",
+    "behaviorDeterministic": "Only non-streaming requests with temperature=0 are cached.",
+    "behaviorBypass": "Bypass with header {header}.",
+    "behaviorTwoTier": "Two-tier storage: in-memory LRU (fast) + SQLite (persistent across restarts).",
+    "behaviorTtl": "Default TTL: 30 minutes. Configure via {envVar}.",
+    "activeDedupKeys": "Active Dedup Keys",
+    "dedupWindow": "Dedup Window"
  }
 }
@@ -185,7 +185,9 @@
    "agents": "Ejen",
    "cliToolsShort": "Alat",
    "autoCombo": "Auto Combo",
-    "searchTools": "Search Tools"
+    "searchTools": "Search Tools",
+    "cache": "Cache",
+    "cacheShort": "Cache"
  },
  "themesPage": {
    "title": "Themes",
@@ -2847,5 +2849,37 @@
      "userInitial": "Saya perlukan bantuan",
      "userFollowUp": "Bolehkah anda menghuraikan perkara itu?"
    }
+  },
+  "cache": {
+    "title": "Cache Management",
+    "description": "Monitor and manage semantic response cache, hit rates, and token savings.",
+    "refresh": "Refresh",
+    "clearAll": "Clear All",
+    "memoryEntries": "Memory Entries",
+    "dbEntries": "DB Entries",
+    "cacheHits": "Cache Hits",
+    "tokensSaved": "Tokens Saved",
+    "hitRate": "Hit Rate",
+    "performance": "Cache Performance",
+    "behavior": "Cache Behavior",
+    "idempotency": "Idempotency Layer",
+    "clearSuccess": "Cache cleared. {count} expired entries removed.",
+    "clearError": "Failed to clear cache.",
+    "unavailable": "Cache unavailable",
+    "unavailableDesc": "Could not fetch cache statistics. Make sure the server is running.",
+    "memoryEntriesSub": "In-memory LRU",
+    "dbEntriesSub": "Persisted (SQLite)",
+    "cacheHitsSub": "of {total} total",
+    "tokensSavedSub": "Estimated from hits",
+    "autoRefresh": "Auto-refreshes every {seconds}s",
+    "hits": "Hits",
+    "misses": "Misses",
+    "total": "Total",
+    "behaviorDeterministic": "Only non-streaming requests with temperature=0 are cached.",
+    "behaviorBypass": "Bypass with header {header}.",
+    "behaviorTwoTier": "Two-tier storage: in-memory LRU (fast) + SQLite (persistent across restarts).",
+    "behaviorTtl": "Default TTL: 30 minutes. Configure via {envVar}.",
+    "activeDedupKeys": "Active Dedup Keys",
+    "dedupWindow": "Dedup Window"
  }
 }
@@ -185,7 +185,9 @@
    "agents": "Agenten",
    "cliToolsShort": "Gereedschap",
    "autoCombo": "Auto Combo",
-    "searchTools": "Search Tools"
+    "searchTools": "Search Tools",
+    "cache": "Cache",
+    "cacheShort": "Cache"
  },
  "themesPage": {
    "title": "Themes",
@@ -2847,5 +2849,37 @@
      "userInitial": "Ik heb hulp nodig bij",
      "userFollowUp": "Kunt u dat nader toelichten?"
    }
+  },
+  "cache": {
+    "title": "Cache Management",
+    "description": "Monitor and manage semantic response cache, hit rates, and token savings.",
+    "refresh": "Refresh",
+    "clearAll": "Clear All",
+    "memoryEntries": "Memory Entries",
+    "dbEntries": "DB Entries",
+    "cacheHits": "Cache Hits",
+    "tokensSaved": "Tokens Saved",
+    "hitRate": "Hit Rate",
+    "performance": "Cache Performance",
+    "behavior": "Cache Behavior",
+    "idempotency": "Idempotency Layer",
+    "clearSuccess": "Cache cleared. {count} expired entries removed.",
+    "clearError": "Failed to clear cache.",
+    "unavailable": "Cache unavailable",
+    "unavailableDesc": "Could not fetch cache statistics. Make sure the server is running.",
+    "memoryEntriesSub": "In-memory LRU",
+    "dbEntriesSub": "Persisted (SQLite)",
+    "cacheHitsSub": "of {total} total",
+    "tokensSavedSub": "Estimated from hits",
+    "autoRefresh": "Auto-refreshes every {seconds}s",
+    "hits": "Hits",
+    "misses": "Misses",
+    "total": "Total",
+    "behaviorDeterministic": "Only non-streaming requests with temperature=0 are cached.",
+    "behaviorBypass": "Bypass with header {header}.",
+    "behaviorTwoTier": "Two-tier storage: in-memory LRU (fast) + SQLite (persistent across restarts).",
+    "behaviorTtl": "Default TTL: 30 minutes. Configure via {envVar}.",
+    "activeDedupKeys": "Active Dedup Keys",
+    "dedupWindow": "Dedup Window"
  }
 }
@@ -185,7 +185,9 @@
    "agents": "Agenter",
    "cliToolsShort": "Verktøy",
    "autoCombo": "Auto Combo",
-    "searchTools": "Search Tools"
+    "searchTools": "Search Tools",
+    "cache": "Cache",
+    "cacheShort": "Cache"
  },
  "themesPage": {
    "title": "Themes",
@@ -2847,5 +2849,37 @@
      "userInitial": "Jeg trenger hjelp med",
      "userFollowUp": "Kan du utdype det?"
    }
+  },
+  "cache": {
+    "title": "Cache Management",
+    "description": "Monitor and manage semantic response cache, hit rates, and token savings.",
+    "refresh": "Refresh",
+    "clearAll": "Clear All",
+    "memoryEntries": "Memory Entries",
+    "dbEntries": "DB Entries",
+    "cacheHits": "Cache Hits",
+    "tokensSaved": "Tokens Saved",
+    "hitRate": "Hit Rate",
+    "performance": "Cache Performance",
+    "behavior": "Cache Behavior",
+    "idempotency": "Idempotency Layer",
+    "clearSuccess": "Cache cleared. {count} expired entries removed.",
+    "clearError": "Failed to clear cache.",
+    "unavailable": "Cache unavailable",
+    "unavailableDesc": "Could not fetch cache statistics. Make sure the server is running.",
+    "memoryEntriesSub": "In-memory LRU",
+    "dbEntriesSub": "Persisted (SQLite)",
+    "cacheHitsSub": "of {total} total",
+    "tokensSavedSub": "Estimated from hits",
+    "autoRefresh": "Auto-refreshes every {seconds}s",
+    "hits": "Hits",
+    "misses": "Misses",
+    "total": "Total",
+    "behaviorDeterministic": "Only non-streaming requests with temperature=0 are cached.",
+    "behaviorBypass": "Bypass with header {header}.",
+    "behaviorTwoTier": "Two-tier storage: in-memory LRU (fast) + SQLite (persistent across restarts).",
+    "behaviorTtl": "Default TTL: 30 minutes. Configure via {envVar}.",
+    "activeDedupKeys": "Active Dedup Keys",
+    "dedupWindow": "Dedup Window"
  }
 }
@@ -185,7 +185,9 @@
    "agents": "Mga Agent",
    "cliToolsShort": "Mga Tool",
    "autoCombo": "Auto Combo",
-    "searchTools": "Search Tools"
+    "searchTools": "Search Tools",
+    "cache": "Cache",
+    "cacheShort": "Cache"
  },
  "themesPage": {
    "title": "Themes",
@@ -2847,5 +2849,37 @@
      "userInitial": "Kailangan ko ng tulong sa",
      "userFollowUp": "Maaari mo bang ipaliwanag iyon?"
    }
+  },
+  "cache": {
+    "title": "Cache Management",
+    "description": "Monitor and manage semantic response cache, hit rates, and token savings.",
+    "refresh": "Refresh",
+    "clearAll": "Clear All",
+    "memoryEntries": "Memory Entries",
+    "dbEntries": "DB Entries",
+    "cacheHits": "Cache Hits",
+    "tokensSaved": "Tokens Saved",
+    "hitRate": "Hit Rate",
+    "performance": "Cache Performance",
+    "behavior": "Cache Behavior",
+    "idempotency": "Idempotency Layer",
+    "clearSuccess": "Cache cleared. {count} expired entries removed.",
+    "clearError": "Failed to clear cache.",
+    "unavailable": "Cache unavailable",
+    "unavailableDesc": "Could not fetch cache statistics. Make sure the server is running.",
+    "memoryEntriesSub": "In-memory LRU",
+    "dbEntriesSub": "Persisted (SQLite)",
+    "cacheHitsSub": "of {total} total",
+    "tokensSavedSub": "Estimated from hits",
+    "autoRefresh": "Auto-refreshes every {seconds}s",
+    "hits": "Hits",
+    "misses": "Misses",
+    "total": "Total",
+    "behaviorDeterministic": "Only non-streaming requests with temperature=0 are cached.",
+    "behaviorBypass": "Bypass with header {header}.",
+    "behaviorTwoTier": "Two-tier storage: in-memory LRU (fast) + SQLite (persistent across restarts).",
+    "behaviorTtl": "Default TTL: 30 minutes. Configure via {envVar}.",
+    "activeDedupKeys": "Active Dedup Keys",
+    "dedupWindow": "Dedup Window"
  }
 }
@@ -185,7 +185,9 @@
    "agents": "Agenci",
    "cliToolsShort": "Narzędzia",
    "autoCombo": "Auto Combo",
-    "searchTools": "Search Tools"
+    "searchTools": "Search Tools",
+    "cache": "Cache",
+    "cacheShort": "Cache"
  },
  "themesPage": {
    "title": "Themes",
@@ -2847,5 +2849,37 @@
      "userInitial": "Potrzebuję pomocy",
      "userFollowUp": "Możesz to rozwinąć?"
    }
+  },
+  "cache": {
+    "title": "Cache Management",
+    "description": "Monitor and manage semantic response cache, hit rates, and token savings.",
+    "refresh": "Refresh",
+    "clearAll": "Clear All",
+    "memoryEntries": "Memory Entries",
+    "dbEntries": "DB Entries",
+    "cacheHits": "Cache Hits",
+    "tokensSaved": "Tokens Saved",
+    "hitRate": "Hit Rate",
+    "performance": "Cache Performance",
+    "behavior": "Cache Behavior",
+    "idempotency": "Idempotency Layer",
+    "clearSuccess": "Cache cleared. {count} expired entries removed.",
+    "clearError": "Failed to clear cache.",
+    "unavailable": "Cache unavailable",
+    "unavailableDesc": "Could not fetch cache statistics. Make sure the server is running.",
+    "memoryEntriesSub": "In-memory LRU",
+    "dbEntriesSub": "Persisted (SQLite)",
+    "cacheHitsSub": "of {total} total",
+    "tokensSavedSub": "Estimated from hits",
+    "autoRefresh": "Auto-refreshes every {seconds}s",
+    "hits": "Hits",
+    "misses": "Misses",
+    "total": "Total",
+    "behaviorDeterministic": "Only non-streaming requests with temperature=0 are cached.",
+    "behaviorBypass": "Bypass with header {header}.",
+    "behaviorTwoTier": "Two-tier storage: in-memory LRU (fast) + SQLite (persistent across restarts).",
+    "behaviorTtl": "Default TTL: 30 minutes. Configure via {envVar}.",
+    "activeDedupKeys": "Active Dedup Keys",
+    "dedupWindow": "Dedup Window"
  }
 }
@@ -185,7 +185,9 @@
    "playground": "Playground",
    "agents": "Agentes",
    "cliToolsShort": "Ferramentas",
-    "searchTools": "Search Tools"
+    "searchTools": "Search Tools",
+    "cache": "Cache",
+    "cacheShort": "Cache"
  },
  "themesPage": {
    "title": "Themes",
@@ -2865,5 +2867,37 @@
      "userInitial": "preciso de ajuda com",
      "userFollowUp": "Você pode explicar isso?"
    }
+  },
+  "cache": {
+    "title": "Cache Management",
+    "description": "Monitor and manage semantic response cache, hit rates, and token savings.",
+    "refresh": "Refresh",
+    "clearAll": "Clear All",
+    "memoryEntries": "Memory Entries",
+    "dbEntries": "DB Entries",
+    "cacheHits": "Cache Hits",
+    "tokensSaved": "Tokens Saved",
+    "hitRate": "Hit Rate",
+    "performance": "Cache Performance",
+    "behavior": "Cache Behavior",
+    "idempotency": "Idempotency Layer",
+    "clearSuccess": "Cache cleared. {count} expired entries removed.",
+    "clearError": "Failed to clear cache.",
+    "unavailable": "Cache unavailable",
+    "unavailableDesc": "Could not fetch cache statistics. Make sure the server is running.",
+    "memoryEntriesSub": "In-memory LRU",
+    "dbEntriesSub": "Persisted (SQLite)",
+    "cacheHitsSub": "of {total} total",
+    "tokensSavedSub": "Estimated from hits",
+    "autoRefresh": "Auto-refreshes every {seconds}s",
+    "hits": "Hits",
+    "misses": "Misses",
+    "total": "Total",
+    "behaviorDeterministic": "Only non-streaming requests with temperature=0 are cached.",
+    "behaviorBypass": "Bypass with header {header}.",
+    "behaviorTwoTier": "Two-tier storage: in-memory LRU (fast) + SQLite (persistent across restarts).",
+    "behaviorTtl": "Default TTL: 30 minutes. Configure via {envVar}.",
+    "activeDedupKeys": "Active Dedup Keys",
+    "dedupWindow": "Dedup Window"
  }
 }
@@ -185,7 +185,9 @@
    "agents": "Agentes",
    "cliToolsShort": "Ferramentas",
    "autoCombo": "Auto Combo",
-    "searchTools": "Search Tools"
+    "searchTools": "Search Tools",
+    "cache": "Cache",
+    "cacheShort": "Cache"
  },
  "themesPage": {
    "title": "Themes",
@@ -2847,5 +2849,37 @@
      "userInitial": "preciso de ajuda com",
      "userFollowUp": "Você pode explicar isso?"
    }
+  },
+  "cache": {
+    "title": "Cache Management",
+    "description": "Monitor and manage semantic response cache, hit rates, and token savings.",
+    "refresh": "Refresh",
+    "clearAll": "Clear All",
+    "memoryEntries": "Memory Entries",
+    "dbEntries": "DB Entries",
+    "cacheHits": "Cache Hits",
+    "tokensSaved": "Tokens Saved",
+    "hitRate": "Hit Rate",
+    "performance": "Cache Performance",
+    "behavior": "Cache Behavior",
+    "idempotency": "Idempotency Layer",
+    "clearSuccess": "Cache cleared. {count} expired entries removed.",
+    "clearError": "Failed to clear cache.",
+    "unavailable": "Cache unavailable",
+    "unavailableDesc": "Could not fetch cache statistics. Make sure the server is running.",
+    "memoryEntriesSub": "In-memory LRU",
+    "dbEntriesSub": "Persisted (SQLite)",
+    "cacheHitsSub": "of {total} total",
+    "tokensSavedSub": "Estimated from hits",
+    "autoRefresh": "Auto-refreshes every {seconds}s",
+    "hits": "Hits",
+    "misses": "Misses",
+    "total": "Total",
+    "behaviorDeterministic": "Only non-streaming requests with temperature=0 are cached.",
+    "behaviorBypass": "Bypass with header {header}.",
+    "behaviorTwoTier": "Two-tier storage: in-memory LRU (fast) + SQLite (persistent across restarts).",
+    "behaviorTtl": "Default TTL: 30 minutes. Configure via {envVar}.",
+    "activeDedupKeys": "Active Dedup Keys",
+    "dedupWindow": "Dedup Window"
  }
 }
@@ -185,7 +185,9 @@
    "agents": "Agenți",
    "cliToolsShort": "Instrumente",
    "autoCombo": "Auto Combo",
-    "searchTools": "Search Tools"
+    "searchTools": "Search Tools",
+    "cache": "Cache",
+    "cacheShort": "Cache"
  },
  "themesPage": {
    "title": "Themes",
@@ -2847,5 +2849,37 @@
      "userInitial": "Am nevoie de ajutor cu",
      "userFollowUp": "Puteți detalia asta?"
    }
+  },
+  "cache": {
+    "title": "Cache Management",
+    "description": "Monitor and manage semantic response cache, hit rates, and token savings.",
+    "refresh": "Refresh",
+    "clearAll": "Clear All",
+    "memoryEntries": "Memory Entries",
+    "dbEntries": "DB Entries",
+    "cacheHits": "Cache Hits",
+    "tokensSaved": "Tokens Saved",
+    "hitRate": "Hit Rate",
+    "performance": "Cache Performance",
+    "behavior": "Cache Behavior",
+    "idempotency": "Idempotency Layer",
+    "clearSuccess": "Cache cleared. {count} expired entries removed.",
+    "clearError": "Failed to clear cache.",
+    "unavailable": "Cache unavailable",
+    "unavailableDesc": "Could not fetch cache statistics. Make sure the server is running.",
+    "memoryEntriesSub": "In-memory LRU",
+    "dbEntriesSub": "Persisted (SQLite)",
+    "cacheHitsSub": "of {total} total",
+    "tokensSavedSub": "Estimated from hits",
+    "autoRefresh": "Auto-refreshes every {seconds}s",
+    "hits": "Hits",
+    "misses": "Misses",
+    "total": "Total",
+    "behaviorDeterministic": "Only non-streaming requests with temperature=0 are cached.",
+    "behaviorBypass": "Bypass with header {header}.",
+    "behaviorTwoTier": "Two-tier storage: in-memory LRU (fast) + SQLite (persistent across restarts).",
+    "behaviorTtl": "Default TTL: 30 minutes. Configure via {envVar}.",
+    "activeDedupKeys": "Active Dedup Keys",
+    "dedupWindow": "Dedup Window"
  }
 }
@@ -185,7 +185,9 @@
    "agents": "Агенты",
    "cliToolsShort": "Инструменты",
    "autoCombo": "Auto Combo",
-    "searchTools": "Search Tools"
+    "searchTools": "Search Tools",
+    "cache": "Cache",
+    "cacheShort": "Cache"
  },
  "themesPage": {
    "title": "Темы",
@@ -2847,5 +2849,37 @@
      "userInitial": "мне нужна помощь с",
      "userFollowUp": "Можете ли вы рассказать об этом подробнее?"
    }
+  },
+  "cache": {
+    "title": "Cache Management",
+    "description": "Monitor and manage semantic response cache, hit rates, and token savings.",
+    "refresh": "Refresh",
+    "clearAll": "Clear All",
+    "memoryEntries": "Memory Entries",
+    "dbEntries": "DB Entries",
+    "cacheHits": "Cache Hits",
+    "tokensSaved": "Tokens Saved",
+    "hitRate": "Hit Rate",
+    "performance": "Cache Performance",
+    "behavior": "Cache Behavior",
+    "idempotency": "Idempotency Layer",
+    "clearSuccess": "Cache cleared. {count} expired entries removed.",
+    "clearError": "Failed to clear cache.",
+    "unavailable": "Cache unavailable",
+    "unavailableDesc": "Could not fetch cache statistics. Make sure the server is running.",
+    "memoryEntriesSub": "In-memory LRU",
+    "dbEntriesSub": "Persisted (SQLite)",
+    "cacheHitsSub": "of {total} total",
+    "tokensSavedSub": "Estimated from hits",
+    "autoRefresh": "Auto-refreshes every {seconds}s",
+    "hits": "Hits",
+    "misses": "Misses",
+    "total": "Total",
+    "behaviorDeterministic": "Only non-streaming requests with temperature=0 are cached.",
+    "behaviorBypass": "Bypass with header {header}.",
+    "behaviorTwoTier": "Two-tier storage: in-memory LRU (fast) + SQLite (persistent across restarts).",
+    "behaviorTtl": "Default TTL: 30 minutes. Configure via {envVar}.",
+    "activeDedupKeys": "Active Dedup Keys",
+    "dedupWindow": "Dedup Window"
  }
 }
@@ -185,7 +185,9 @@
    "agents": "Agenti",
    "cliToolsShort": "Nástroje",
    "autoCombo": "Auto Combo",
-    "searchTools": "Search Tools"
+    "searchTools": "Search Tools",
+    "cache": "Cache",
+    "cacheShort": "Cache"
  },
  "themesPage": {
    "title": "Themes",
@@ -2847,5 +2849,37 @@
      "userInitial": "Potrebujem pomoc s",
      "userFollowUp": "Môžete to upresniť?"
    }
+  },
+  "cache": {
+    "title": "Cache Management",
+    "description": "Monitor and manage semantic response cache, hit rates, and token savings.",
+    "refresh": "Refresh",
+    "clearAll": "Clear All",
+    "memoryEntries": "Memory Entries",
+    "dbEntries": "DB Entries",
+    "cacheHits": "Cache Hits",
+    "tokensSaved": "Tokens Saved",
+    "hitRate": "Hit Rate",
+    "performance": "Cache Performance",
+    "behavior": "Cache Behavior",
+    "idempotency": "Idempotency Layer",
+    "clearSuccess": "Cache cleared. {count} expired entries removed.",
+    "clearError": "Failed to clear cache.",
+    "unavailable": "Cache unavailable",
+    "unavailableDesc": "Could not fetch cache statistics. Make sure the server is running.",
+    "memoryEntriesSub": "In-memory LRU",
+    "dbEntriesSub": "Persisted (SQLite)",
+    "cacheHitsSub": "of {total} total",
+    "tokensSavedSub": "Estimated from hits",
+    "autoRefresh": "Auto-refreshes every {seconds}s",
+    "hits": "Hits",
+    "misses": "Misses",
+    "total": "Total",
+    "behaviorDeterministic": "Only non-streaming requests with temperature=0 are cached.",
+    "behaviorBypass": "Bypass with header {header}.",
+    "behaviorTwoTier": "Two-tier storage: in-memory LRU (fast) + SQLite (persistent across restarts).",
+    "behaviorTtl": "Default TTL: 30 minutes. Configure via {envVar}.",
+    "activeDedupKeys": "Active Dedup Keys",
+    "dedupWindow": "Dedup Window"
  }
 }
@@ -185,7 +185,9 @@
    "agents": "Agenter",
    "cliToolsShort": "Verktyg",
    "autoCombo": "Auto Combo",
-    "searchTools": "Search Tools"
+    "searchTools": "Search Tools",
+    "cache": "Cache",
+    "cacheShort": "Cache"
  },
  "themesPage": {
    "title": "Themes",
@@ -2847,5 +2849,37 @@
      "userInitial": "Jag behöver hjälp med",
      "userFollowUp": "Kan du utveckla det?"
    }
+  },
+  "cache": {
+    "title": "Cache Management",
+    "description": "Monitor and manage semantic response cache, hit rates, and token savings.",
+    "refresh": "Refresh",
+    "clearAll": "Clear All",
+    "memoryEntries": "Memory Entries",
+    "dbEntries": "DB Entries",
+    "cacheHits": "Cache Hits",
+    "tokensSaved": "Tokens Saved",
+    "hitRate": "Hit Rate",
+    "performance": "Cache Performance",
+    "behavior": "Cache Behavior",
+    "idempotency": "Idempotency Layer",
+    "clearSuccess": "Cache cleared. {count} expired entries removed.",
+    "clearError": "Failed to clear cache.",
+    "unavailable": "Cache unavailable",
+    "unavailableDesc": "Could not fetch cache statistics. Make sure the server is running.",
+    "memoryEntriesSub": "In-memory LRU",
+    "dbEntriesSub": "Persisted (SQLite)",
+    "cacheHitsSub": "of {total} total",
+    "tokensSavedSub": "Estimated from hits",
+    "autoRefresh": "Auto-refreshes every {seconds}s",
+    "hits": "Hits",
+    "misses": "Misses",
+    "total": "Total",
+    "behaviorDeterministic": "Only non-streaming requests with temperature=0 are cached.",
+    "behaviorBypass": "Bypass with header {header}.",
+    "behaviorTwoTier": "Two-tier storage: in-memory LRU (fast) + SQLite (persistent across restarts).",
+    "behaviorTtl": "Default TTL: 30 minutes. Configure via {envVar}.",
+    "activeDedupKeys": "Active Dedup Keys",
+    "dedupWindow": "Dedup Window"
  }
 }
@@ -185,7 +185,9 @@
    "agents": "เอเจนต์",
    "cliToolsShort": "เครื่องมือ",
    "autoCombo": "Auto Combo",
-    "searchTools": "Search Tools"
+    "searchTools": "Search Tools",
+    "cache": "Cache",
+    "cacheShort": "Cache"
  },
  "themesPage": {
    "title": "ธีมส์",
@@ -2847,5 +2849,37 @@
      "userInitial": "ฉันต้องการความช่วยเหลือเกี่ยวกับ",
      "userFollowUp": "คุณช่วยอธิบายรายละเอียดเกี่ยวกับเรื่องนั้นได้ไหม?"
    }
+  },
+  "cache": {
+    "title": "Cache Management",
+    "description": "Monitor and manage semantic response cache, hit rates, and token savings.",
+    "refresh": "Refresh",
+    "clearAll": "Clear All",
+    "memoryEntries": "Memory Entries",
+    "dbEntries": "DB Entries",
+    "cacheHits": "Cache Hits",
+    "tokensSaved": "Tokens Saved",
+    "hitRate": "Hit Rate",
+    "performance": "Cache Performance",
+    "behavior": "Cache Behavior",
+    "idempotency": "Idempotency Layer",
+    "clearSuccess": "Cache cleared. {count} expired entries removed.",
+    "clearError": "Failed to clear cache.",
+    "unavailable": "Cache unavailable",
+    "unavailableDesc": "Could not fetch cache statistics. Make sure the server is running.",
+    "memoryEntriesSub": "In-memory LRU",
+    "dbEntriesSub": "Persisted (SQLite)",
+    "cacheHitsSub": "of {total} total",
+    "tokensSavedSub": "Estimated from hits",
+    "autoRefresh": "Auto-refreshes every {seconds}s",
+    "hits": "Hits",
+    "misses": "Misses",
+    "total": "Total",
+    "behaviorDeterministic": "Only non-streaming requests with temperature=0 are cached.",
+    "behaviorBypass": "Bypass with header {header}.",
+    "behaviorTwoTier": "Two-tier storage: in-memory LRU (fast) + SQLite (persistent across restarts).",
+    "behaviorTtl": "Default TTL: 30 minutes. Configure via {envVar}.",
+    "activeDedupKeys": "Active Dedup Keys",
+    "dedupWindow": "Dedup Window"
  }
 }
@@ -185,7 +185,9 @@
    "agents": "Агенти",
    "cliToolsShort": "Інструменти",
    "autoCombo": "Auto Combo",
-    "searchTools": "Search Tools"
+    "searchTools": "Search Tools",
+    "cache": "Cache",
+    "cacheShort": "Cache"
  },
  "themesPage": {
    "title": "Themes",
@@ -2847,5 +2849,37 @@
      "userInitial": "Мені потрібна допомога з",
      "userFollowUp": "Чи можете ви розповісти про це?"
    }
+  },
+  "cache": {
+    "title": "Cache Management",
+    "description": "Monitor and manage semantic response cache, hit rates, and token savings.",
+    "refresh": "Refresh",
+    "clearAll": "Clear All",
+    "memoryEntries": "Memory Entries",
+    "dbEntries": "DB Entries",
+    "cacheHits": "Cache Hits",
+    "tokensSaved": "Tokens Saved",
+    "hitRate": "Hit Rate",
+    "performance": "Cache Performance",
+    "behavior": "Cache Behavior",
+    "idempotency": "Idempotency Layer",
+    "clearSuccess": "Cache cleared. {count} expired entries removed.",
+    "clearError": "Failed to clear cache.",
+    "unavailable": "Cache unavailable",
+    "unavailableDesc": "Could not fetch cache statistics. Make sure the server is running.",
+    "memoryEntriesSub": "In-memory LRU",
+    "dbEntriesSub": "Persisted (SQLite)",
+    "cacheHitsSub": "of {total} total",
+    "tokensSavedSub": "Estimated from hits",
+    "autoRefresh": "Auto-refreshes every {seconds}s",
+    "hits": "Hits",
+    "misses": "Misses",
+    "total": "Total",
+    "behaviorDeterministic": "Only non-streaming requests with temperature=0 are cached.",
+    "behaviorBypass": "Bypass with header {header}.",
+    "behaviorTwoTier": "Two-tier storage: in-memory LRU (fast) + SQLite (persistent across restarts).",
+    "behaviorTtl": "Default TTL: 30 minutes. Configure via {envVar}.",
+    "activeDedupKeys": "Active Dedup Keys",
+    "dedupWindow": "Dedup Window"
  }
 }
@@ -185,7 +185,9 @@
    "agents": "Tác nhân",
    "cliToolsShort": "Công cụ",
    "autoCombo": "Auto Combo",
-    "searchTools": "Search Tools"
+    "searchTools": "Search Tools",
+    "cache": "Cache",
+    "cacheShort": "Cache"
  },
  "themesPage": {
    "title": "Themes",
@@ -2847,5 +2849,37 @@
      "userInitial": "Tôi cần giúp đỡ với",
      "userFollowUp": "Bạn có thể giải thích về điều đó?"
    }
+  },
+  "cache": {
+    "title": "Cache Management",
+    "description": "Monitor and manage semantic response cache, hit rates, and token savings.",
+    "refresh": "Refresh",
+    "clearAll": "Clear All",
+    "memoryEntries": "Memory Entries",
+    "dbEntries": "DB Entries",
+    "cacheHits": "Cache Hits",
+    "tokensSaved": "Tokens Saved",
+    "hitRate": "Hit Rate",
+    "performance": "Cache Performance",
+    "behavior": "Cache Behavior",
+    "idempotency": "Idempotency Layer",
+    "clearSuccess": "Cache cleared. {count} expired entries removed.",
+    "clearError": "Failed to clear cache.",
+    "unavailable": "Cache unavailable",
+    "unavailableDesc": "Could not fetch cache statistics. Make sure the server is running.",
+    "memoryEntriesSub": "In-memory LRU",
+    "dbEntriesSub": "Persisted (SQLite)",
+    "cacheHitsSub": "of {total} total",
+    "tokensSavedSub": "Estimated from hits",
+    "autoRefresh": "Auto-refreshes every {seconds}s",
+    "hits": "Hits",
+    "misses": "Misses",
+    "total": "Total",
+    "behaviorDeterministic": "Only non-streaming requests with temperature=0 are cached.",
+    "behaviorBypass": "Bypass with header {header}.",
+    "behaviorTwoTier": "Two-tier storage: in-memory LRU (fast) + SQLite (persistent across restarts).",
+    "behaviorTtl": "Default TTL: 30 minutes. Configure via {envVar}.",
+    "activeDedupKeys": "Active Dedup Keys",
+    "dedupWindow": "Dedup Window"
  }
 }
@@ -185,7 +185,9 @@
    "themeViolet": "紫罗兰",
    "themeOrange": "橙色",
    "themeCyan": "青色",
-    "cliToolsShort": "工具"
+    "cliToolsShort": "工具",
+    "cache": "Cache",
+    "cacheShort": "Cache"
  },
  "themesPage": {
    "title": "主题",
@@ -2843,5 +2845,37 @@
      "userInitial": "我需要帮助来处理",
      "userFollowUp": "你可以再详细说明一下吗？"
    }
+  },
+  "cache": {
+    "title": "Cache Management",
+    "description": "Monitor and manage semantic response cache, hit rates, and token savings.",
+    "refresh": "Refresh",
+    "clearAll": "Clear All",
+    "memoryEntries": "Memory Entries",
+    "dbEntries": "DB Entries",
+    "cacheHits": "Cache Hits",
+    "tokensSaved": "Tokens Saved",
+    "hitRate": "Hit Rate",
+    "performance": "Cache Performance",
+    "behavior": "Cache Behavior",
+    "idempotency": "Idempotency Layer",
+    "clearSuccess": "Cache cleared. {count} expired entries removed.",
+    "clearError": "Failed to clear cache.",
+    "unavailable": "Cache unavailable",
+    "unavailableDesc": "Could not fetch cache statistics. Make sure the server is running.",
+    "memoryEntriesSub": "In-memory LRU",
+    "dbEntriesSub": "Persisted (SQLite)",
+    "cacheHitsSub": "of {total} total",
+    "tokensSavedSub": "Estimated from hits",
+    "autoRefresh": "Auto-refreshes every {seconds}s",
+    "hits": "Hits",
+    "misses": "Misses",
+    "total": "Total",
+    "behaviorDeterministic": "Only non-streaming requests with temperature=0 are cached.",
+    "behaviorBypass": "Bypass with header {header}.",
+    "behaviorTwoTier": "Two-tier storage: in-memory LRU (fast) + SQLite (persistent across restarts).",
+    "behaviorTtl": "Default TTL: 30 minutes. Configure via {envVar}.",
+    "activeDedupKeys": "Active Dedup Keys",
+    "dedupWindow": "Dedup Window"
  }
 }
@@ -0,0 +1,74 @@
+import { validateProviderApiKey } from "@/lib/providers/validation";
+import { getProviderCredentials } from "@/sse/services/auth";
+import { getModelInfo } from "@/sse/services/model";
+
+const SOFT_REACHABILITY_STATUSES = new Set([400, 405, 406, 409, 422]);
+
+export function buildComboTestRequestBody(modelStr: string) {
+  return {
+    model: modelStr,
+    messages: [{ role: "user", content: "Reply with OK only." }],
+    // Some gateway-routed models reject ultra-tiny budgets during smoke tests.
+    max_tokens: 16,
+    stream: false,
+  };
+}
+
+export function shouldProbeComboTestReachability(statusCode: number) {
+  return SOFT_REACHABILITY_STATUSES.has(Number(statusCode));
+}
+
+type ProbeDeps = {
+  getModelInfo?: typeof getModelInfo;
+  getProviderCredentials?: typeof getProviderCredentials;
+  validateProviderApiKey?: typeof validateProviderApiKey;
+};
+
+export async function probeComboModelReachability(modelStr: string, deps: ProbeDeps = {}) {
+  const resolveModel = deps.getModelInfo || getModelInfo;
+  const loadCredentials = deps.getProviderCredentials || getProviderCredentials;
+  const validateKey = deps.validateProviderApiKey || validateProviderApiKey;
+
+  const modelInfo = await resolveModel(modelStr);
+  if (!modelInfo?.provider) {
+    return { reachable: false, reason: "unresolved_model" };
+  }
+
+  const credentials = await loadCredentials(
+    modelInfo.provider,
+    null,
+    null,
+    modelInfo.model || modelStr
+  );
+  if (!credentials || credentials.allRateLimited) {
+    return { reachable: false, reason: "credentials_unavailable" };
+  }
+
+  const apiKey = credentials.apiKey || credentials.accessToken;
+  if (typeof apiKey !== "string" || apiKey.trim().length === 0) {
+    return { reachable: false, reason: "missing_auth_material" };
+  }
+
+  const providerSpecificData =
+    credentials.providerSpecificData && typeof credentials.providerSpecificData === "object"
+      ? { ...credentials.providerSpecificData }
+      : {};
+
+  if (!providerSpecificData.validationModelId && modelInfo.model) {
+    providerSpecificData.validationModelId = modelInfo.model;
+  }
+
+  const validation = await validateKey({
+    provider: modelInfo.provider,
+    apiKey,
+    providerSpecificData,
+  });
+
+  return {
+    reachable: Boolean(validation?.valid),
+    provider: modelInfo.provider,
+    model: modelInfo.model || null,
+    method: validation?.method || null,
+    warning: validation?.warning || null,
+  };
+}
@@ -8,20 +8,28 @@
 import { v4 as uuidv4 } from "uuid";
 import { getDbInstance } from "./core";
 import { getSettings } from "./settings";
+import { isNoLog } from "../compliance";
+import {
+  protectPayloadForLog,
+  serializePayloadForStorage,
+  parseStoredPayload,
+} from "../logPayloads";

 export interface RequestDetailLog {
  id?: string;
  call_log_id?: string | null;
  timestamp?: string;
-  client_request?: string | null;
-  translated_request?: string | null;
-  provider_response?: string | null;
-  client_response?: string | null;
+  client_request?: unknown | null;
+  translated_request?: unknown | null;
+  provider_response?: unknown | null;
+  client_response?: unknown | null;
  provider?: string | null;
  model?: string | null;
  source_format?: string | null;
  target_format?: string | null;
  duration_ms?: number;
+  api_key_id?: string | null;
+  no_log?: boolean;
 }

 /** Returns true if detailed logging is enabled in settings */
@@ -37,16 +45,14 @@ export async function isDetailedLoggingEnabled(): Promise<boolean> {

 /** Save a detailed log entry — caller must verify isDetailedLoggingEnabled() first */
 export function saveRequestDetailLog(entry: RequestDetailLog): void {
+  const noLogEnabled =
+    Boolean(entry.no_log) || (entry.api_key_id ? isNoLog(entry.api_key_id) : false);
+  if (noLogEnabled) return;
+
  const db = getDbInstance();
  const id = entry.id ?? uuidv4();
  const timestamp = entry.timestamp ?? new Date().toISOString();

-  // Trim large bodies to avoid excessive disk usage (max 64KB each)
-  const trim = (s: string | null | undefined, max = 65536): string | null => {
-    if (!s) return null;
-    return s.length > max ? s.slice(0, max) + "…[truncated]" : s;
-  };
-
  db.prepare(
    `
    INSERT INTO request_detail_logs
@@ -58,10 +64,10 @@ export function saveRequestDetailLog(entry: RequestDetailLog): void {
    id,
    entry.call_log_id ?? null,
    timestamp,
-    trim(entry.client_request),
-    trim(entry.translated_request),
-    trim(entry.provider_response),
-    trim(entry.client_response),
+    serializePayloadForStorage(protectPayloadForLog(entry.client_request)),
+    serializePayloadForStorage(protectPayloadForLog(entry.translated_request)),
+    serializePayloadForStorage(protectPayloadForLog(entry.provider_response)),
+    serializePayloadForStorage(protectPayloadForLog(entry.client_response)),
    entry.provider ?? null,
    entry.model ?? null,
    entry.source_format ?? null,
@@ -73,7 +79,7 @@ export function saveRequestDetailLog(entry: RequestDetailLog): void {
 /** Fetch detailed logs (latest first) */
 export function getRequestDetailLogs(limit = 50, offset = 0): RequestDetailLog[] {
  const db = getDbInstance();
-  return db
+  const rows = db
    .prepare(
      `
      SELECT * FROM request_detail_logs
@@ -81,14 +87,34 @@ export function getRequestDetailLogs(limit = 50, offset = 0): RequestDetailLog[]
      LIMIT ? OFFSET ?
    `
    )
-    .all(limit, offset) as RequestDetailLog[];
+    .all(limit, offset) as Array<Record<string, unknown>>;
+
+  return rows.map(mapDetailedLogRow);
 }

 /** Get a single detailed log by ID */
 export function getRequestDetailLogById(id: string): RequestDetailLog | null {
  const db = getDbInstance();
-  return (db.prepare("SELECT * FROM request_detail_logs WHERE id = ?").get(id) ??
-    null) as RequestDetailLog | null;
+  const row = db.prepare("SELECT * FROM request_detail_logs WHERE id = ?").get(id) as
+    | Record<string, unknown>
+    | undefined;
+  return row ? mapDetailedLogRow(row) : null;
+}
+
+/** Get the most recent detailed log for a call log ID */
+export function getRequestDetailLogByCallLogId(callLogId: string): RequestDetailLog | null {
+  const db = getDbInstance();
+  const row = db
+    .prepare(
+      `
+      SELECT * FROM request_detail_logs
+      WHERE call_log_id = ?
+      ORDER BY timestamp DESC
+      LIMIT 1
+    `
+    )
+    .get(callLogId) as Record<string, unknown> | undefined;
+  return row ? mapDetailedLogRow(row) : null;
 }

 /** Get total count of detailed logs */
@@ -99,3 +125,20 @@ export function getRequestDetailLogCount(): number {
  };
  return row?.cnt ?? 0;
 }
+
+function mapDetailedLogRow(row: Record<string, unknown>): RequestDetailLog {
+  return {
+    id: typeof row.id === "string" ? row.id : undefined,
+    call_log_id: typeof row.call_log_id === "string" ? row.call_log_id : null,
+    timestamp: typeof row.timestamp === "string" ? row.timestamp : undefined,
+    client_request: parseStoredPayload(row.client_request),
+    translated_request: parseStoredPayload(row.translated_request),
+    provider_response: parseStoredPayload(row.provider_response),
+    client_response: parseStoredPayload(row.client_response),
+    provider: typeof row.provider === "string" ? row.provider : null,
+    model: typeof row.model === "string" ? row.model : null,
+    source_format: typeof row.source_format === "string" ? row.source_format : null,
+    target_format: typeof row.target_format === "string" ? row.target_format : null,
+    duration_ms: typeof row.duration_ms === "number" ? row.duration_ms : 0,
+  };
+}
@@ -1,16 +1,18 @@
 import initializeCloudSync from "@/shared/services/initializeCloudSync";
+import { startModelSyncScheduler } from "@/shared/services/modelSyncScheduler";
 import "@/lib/tokenHealthCheck"; // Proactive token health-check scheduler

-// Initialize cloud sync when this module is imported
+// Initialize background sync services when this module is imported
 let initialized = false;

 export async function ensureCloudSyncInitialized() {
  if (!initialized) {
    try {
      await initializeCloudSync();
+      startModelSyncScheduler();
      initialized = true;
    } catch (error) {
-      console.error("[ServerInit] Error initializing cloud sync:", error);
+      console.error("[ServerInit] Error initializing background sync services:", error);
    }
  }
  return initialized;
@@ -0,0 +1,109 @@
+import { sanitizePII } from "./piiSanitizer";
+
+const SENSITIVE_KEYS = new Set([
+  "api_key",
+  "apiKey",
+  "api-key",
+  "authorization",
+  "Authorization",
+  "x-api-key",
+  "X-Api-Key",
+  "access_token",
+  "accessToken",
+  "refresh_token",
+  "refreshToken",
+  "password",
+  "secret",
+  "token",
+]);
+
+type JsonRecord = Record<string, unknown>;
+
+export function cloneLogPayload<T>(value: T): T {
+  if (value === null || value === undefined) return value;
+  if (typeof globalThis.structuredClone === "function") {
+    return globalThis.structuredClone(value);
+  }
+  return JSON.parse(JSON.stringify(value)) as T;
+}
+
+export function normalizePayloadForLog(payload: unknown): unknown {
+  if (typeof payload !== "string") return payload;
+
+  const trimmed = payload.trim();
+  if (!trimmed) return "";
+
+  try {
+    return JSON.parse(trimmed);
+  } catch {
+    return { _rawText: payload };
+  }
+}
+
+export function redactPayload(payload: unknown): unknown {
+  if (!payload || typeof payload !== "object") return payload;
+  if (Array.isArray(payload)) return payload.map(redactPayload);
+
+  const redacted: JsonRecord = {};
+  for (const [key, value] of Object.entries(payload)) {
+    if (SENSITIVE_KEYS.has(key)) {
+      redacted[key] = "[REDACTED]";
+    } else if (typeof value === "string" && value.startsWith("Bearer ")) {
+      redacted[key] = "Bearer [REDACTED]";
+    } else if (typeof value === "object" && value !== null) {
+      redacted[key] = redactPayload(value);
+    } else {
+      redacted[key] = value;
+    }
+  }
+  return redacted;
+}
+
+export function sanitizePayloadPII(payload: unknown): unknown {
+  if (typeof payload === "string") {
+    return sanitizePII(payload).text;
+  }
+  if (Array.isArray(payload)) {
+    return payload.map(sanitizePayloadPII);
+  }
+  if (!payload || typeof payload !== "object") {
+    return payload;
+  }
+
+  const sanitized: JsonRecord = {};
+  for (const [key, value] of Object.entries(payload)) {
+    sanitized[key] = sanitizePayloadPII(value);
+  }
+  return sanitized;
+}
+
+export function protectPayloadForLog(payload: unknown): unknown {
+  if (payload === null || payload === undefined) return null;
+  const normalized = normalizePayloadForLog(payload);
+  const piiSanitized = sanitizePayloadPII(normalized);
+  return redactPayload(piiSanitized);
+}
+
+export function serializePayloadForStorage(payload: unknown, maxLength = 65536): string | null {
+  if (payload === null || payload === undefined) return null;
+
+  const exact = JSON.stringify(payload);
+  if (exact.length <= maxLength) {
+    return exact;
+  }
+
+  return JSON.stringify({
+    _truncated: true,
+    _originalSize: exact.length,
+    _preview: exact.slice(0, maxLength),
+  });
+}
+
+export function parseStoredPayload(value: unknown): unknown | null {
+  if (typeof value !== "string" || value.trim().length === 0) return null;
+  try {
+    return JSON.parse(value);
+  } catch {
+    return { _rawText: value };
+  }
+}
@@ -11,10 +11,12 @@ import path from "path";
 import fs from "fs";
 import { getDbInstance } from "../db/core";
 import { getSettings } from "../db/settings";
+import { getRequestDetailLogByCallLogId } from "../db/detailedLogs";
 import { shouldPersistToDisk, CALL_LOGS_DIR } from "./migrations";
 import { getLoggedInputTokens, getLoggedOutputTokens } from "./tokenAccounting";
 import { isNoLog } from "../compliance";
 import { sanitizePII } from "../piiSanitizer";
+import { protectPayloadForLog, parseStoredPayload } from "../logPayloads";

 type JsonRecord = Record<string, unknown>;

@@ -35,15 +37,6 @@ function toStringOrNull(value: unknown): string | null {
  return typeof value === "string" ? value : null;
 }

-function parseJsonString(value: unknown): unknown | null {
-  if (typeof value !== "string" || value.trim().length === 0) return null;
-  try {
-    return JSON.parse(value);
-  } catch {
-    return null;
-  }
-}
-
 function hasTruncatedFlag(value: unknown): boolean {
  if (!value || typeof value !== "object" || Array.isArray(value)) return false;
  return (value as Record<string, unknown>)._truncated === true;
@@ -108,80 +101,6 @@ export function invalidateCallLogsMaxCache(): void {
    expiresAt: 0,
  };
 }
-
-/** Fields that should always be redacted from logged payloads */
-const SENSITIVE_KEYS = new Set([
-  "api_key",
-  "apiKey",
-  "api-key",
-  "authorization",
-  "Authorization",
-  "x-api-key",
-  "X-Api-Key",
-  "access_token",
-  "accessToken",
-  "refresh_token",
-  "refreshToken",
-  "password",
-  "secret",
-  "token",
-]);
-
-/**
- * Redact sensitive fields from a payload before persistence.
- */
-function redactPayload(obj: any): any {
-  if (!obj || typeof obj !== "object") return obj;
-  if (Array.isArray(obj)) return obj.map(redactPayload);
-
-  const redacted: Record<string, any> = {};
-  for (const [key, value] of Object.entries(obj)) {
-    if (SENSITIVE_KEYS.has(key)) {
-      redacted[key] = "[REDACTED]";
-    } else if (typeof value === "string" && value.startsWith("Bearer ")) {
-      redacted[key] = "Bearer [REDACTED]";
-    } else if (typeof value === "object" && value !== null) {
-      redacted[key] = redactPayload(value);
-    } else {
-      redacted[key] = value;
-    }
-  }
-  return redacted;
-}
-
-/**
- * Recursively sanitize PII from string fields in a payload.
- * Uses lib/piiSanitizer config flags to determine if redaction is enabled.
- */
-function sanitizePayloadPII(obj: any): any {
-  if (typeof obj === "string") {
-    return sanitizePII(obj).text;
-  }
-  if (Array.isArray(obj)) {
-    return obj.map(sanitizePayloadPII);
-  }
-  if (!obj || typeof obj !== "object") {
-    return obj;
-  }
-
-  const sanitized: Record<string, any> = {};
-  for (const [key, value] of Object.entries(obj)) {
-    sanitized[key] = sanitizePayloadPII(value);
-  }
-  return sanitized;
-}
-
-/**
- * Apply payload protection chain before persistence.
- * 1) Optional PII sanitization
- * 2) Mandatory key/token redaction
- */
-function protectPayloadForLog(payload: any): any {
-  if (!payload || !shouldLogPayloadInDb) return null;
-  const piiSanitized = sanitizePayloadPII(payload);
-  return redactPayload(piiSanitized);
-}
-
 let logIdCounter = 0;
 function generateLogId() {
  logIdCounter++;
@@ -198,8 +117,10 @@ export async function saveCallLog(entry: any) {
    const apiKeyId = entry.apiKeyId || null;
    const noLogEnabled = Boolean(entry.noLog) || (apiKeyId ? isNoLog(apiKeyId) : false);

-    const protectedRequestBody = noLogEnabled ? null : protectPayloadForLog(entry.requestBody);
-    const protectedResponseBody = noLogEnabled ? null : protectPayloadForLog(entry.responseBody);
+    const protectedRequestBody =
+      noLogEnabled || !shouldLogPayloadInDb ? null : protectPayloadForLog(entry.requestBody);
+    const protectedResponseBody =
+      noLogEnabled || !shouldLogPayloadInDb ? null : protectPayloadForLog(entry.responseBody);

    // Resolve account name
    let account = entry.connectionId ? entry.connectionId.slice(0, 8) : "-";
@@ -227,7 +148,7 @@ export async function saveCallLog(entry: any) {
    };

    const logEntry = {
-      id: generateLogId(),
+      id: typeof entry.id === "string" && entry.id.length > 0 ? entry.id : generateLogId(),
      timestamp: new Date().toISOString(),
      method: entry.method || "POST",
      path: entry.path || "/v1/chat/completions",
@@ -470,8 +391,8 @@ export async function getCallLogById(id: string) {
    apiKeyId: toStringOrNull(entryRow.api_key_id),
    apiKeyName: toStringOrNull(entryRow.api_key_name),
    comboName: toStringOrNull(entryRow.combo_name),
-    requestBody: parseJsonString(entryRow.request_body),
-    responseBody: parseJsonString(entryRow.response_body),
+    requestBody: parseStoredPayload(entryRow.request_body),
+    responseBody: parseStoredPayload(entryRow.response_body),
    error: toStringOrNull(entryRow.error),
  };

@@ -492,7 +413,20 @@ export async function getCallLogById(id: string) {
    }
  }

-  return entry;
+  const detailed = getRequestDetailLogByCallLogId(id);
+  if (!detailed) {
+    return entry;
+  }
+
+  return {
+    ...entry,
+    pipelinePayloads: {
+      clientRequest: detailed.client_request ?? null,
+      providerRequest: detailed.translated_request ?? null,
+      providerResponse: detailed.provider_response ?? null,
+      clientResponse: detailed.client_response ?? null,
+    },
+  };
 }

 /**
@@ -5,6 +5,7 @@ import { getSettings } from "./lib/localDb";
 import { isPublicRoute, verifyAuth, isAuthRequired } from "./shared/utils/apiAuth";
 import { checkBodySize, getBodySizeLimit } from "./shared/middleware/bodySizeGuard";
 import { isDraining } from "./lib/gracefulShutdown";
+import { isModelSyncInternalRequest } from "./shared/services/modelSyncScheduler";

 const SECRET = new TextEncoder().encode(process.env.JWT_SECRET || "");

@@ -43,6 +44,14 @@ export async function proxy(request) {
      return response;
    }

+    // Allow the model auto-sync scheduler to reach only its internal provider routes.
+    if (
+      isModelSyncInternalRequest(request) &&
+      /^\/api\/providers\/[^/]+\/(sync-models|models)$/.test(pathname)
+    ) {
+      return response;
+    }
+
    // Check if auth is required at all (respects requireLogin setting)
    const authRequired = await isAuthRequired();
    if (!authRequired) {
@@ -80,8 +80,42 @@ export default function RequestLoggerDetail({ log, detail, loading, onClose, onC
    }
  };

-  const requestJson = detail?.requestBody ? JSON.stringify(detail.requestBody, null, 2) : null;
-  const responseJson = detail?.responseBody ? JSON.stringify(detail.responseBody, null, 2) : null;
+  const toPrettyJson = (payload) => {
+    if (payload === null || payload === undefined) return null;
+    try {
+      return JSON.stringify(payload, null, 2);
+    } catch {
+      return String(payload);
+    }
+  };
+
+  const pipelinePayloads = detail?.pipelinePayloads || null;
+  const payloadSections = pipelinePayloads
+    ? [
+        {
+          key: "client-request",
+          title: "Client Request",
+          json: toPrettyJson(pipelinePayloads.clientRequest),
+        },
+        {
+          key: "provider-request",
+          title: "Provider Request",
+          json: toPrettyJson(pipelinePayloads.providerRequest),
+        },
+        {
+          key: "provider-response",
+          title: "Provider Response",
+          json: toPrettyJson(pipelinePayloads.providerResponse),
+        },
+        {
+          key: "client-response",
+          title: "Client Response",
+          json: toPrettyJson(pipelinePayloads.clientResponse),
+        },
+      ].filter((section) => section.json)
+    : [];
+  const requestJson = detail?.requestBody ? toPrettyJson(detail.requestBody) : null;
+  const responseJson = detail?.responseBody ? toPrettyJson(detail.responseBody) : null;

  return (
    <div
@@ -240,33 +274,41 @@ export default function RequestLoggerDetail({ log, detail, loading, onClose, onC
            </div>
          ) : (
            <>
-              {/* Response Payload (返回) — show first */}
-              {responseJson && (
+              {payloadSections.length > 0 &&
+                payloadSections.map((section) => (
+                  <PayloadSection
+                    key={section.key}
+                    title={section.title}
+                    json={section.json}
+                    onCopy={() => onCopy(section.json)}
+                  />
+                ))}
+
+              {payloadSections.length === 0 && responseJson && (
                <PayloadSection
-                  title="Response Payload (返回)"
+                  title="Response Payload (Legacy)"
                  json={responseJson}
                  onCopy={() => onCopy(responseJson)}
                />
              )}

-              {/* Request Payload (请求) */}
-              {requestJson && (
+              {payloadSections.length === 0 && requestJson && (
                <PayloadSection
-                  title="Request Payload (请求)"
+                  title="Request Payload (Legacy)"
                  json={requestJson}
                  onCopy={() => onCopy(requestJson)}
                />
              )}

-              {!requestJson && !responseJson && !loading && (
+              {payloadSections.length === 0 && !requestJson && !responseJson && !loading && (
                <div className="p-6 text-center text-text-muted">
                  <span className="material-symbols-outlined text-[32px] mb-2 block opacity-40">
                    info
                  </span>
                  <p className="text-sm">No payload data available for this log entry.</p>
                  <p className="text-xs mt-1">
-                    Request/response bodies are only captured for non-streaming calls or when
-                    streaming completes normally.
+                    Enable detailed logging first if you want the four-stage client/provider payload
+                    view for new requests.
                  </p>
                </div>
              )}
@@ -93,6 +93,9 @@ export default function RequestLoggerV2() {
  const [selectedLog, setSelectedLog] = useState(null);
  const [detailLoading, setDetailLoading] = useState(false);
  const [detailData, setDetailData] = useState(null);
+  const [detailLoggingEnabled, setDetailLoggingEnabled] = useState(false);
+  const [detailLoggingLoading, setDetailLoggingLoading] = useState(false);
+  const [detailLoggingReady, setDetailLoggingReady] = useState(false);
  const intervalRef = useRef(null);
  const hasLoadedRef = useRef(false);
  const [providerNodes, setProviderNodes] = useState([]);
@@ -161,6 +164,20 @@ export default function RequestLoggerV2() {
      .catch(() => {});
  }, []);

+  useEffect(() => {
+    fetch("/api/logs/detail?limit=1")
+      .then(async (res) => {
+        if (!res.ok) return null;
+        return await res.json();
+      })
+      .then((data) => {
+        if (!data) return;
+        setDetailLoggingEnabled(data.enabled === true);
+        setDetailLoggingReady(true);
+      })
+      .catch(() => {});
+  }, []);
+
  // Auto-refresh
  useEffect(() => {
    if (intervalRef.current) clearInterval(intervalRef.current);
@@ -232,6 +249,25 @@ export default function RequestLoggerV2() {
    setDetailData(null);
  };

+  const toggleDetailLogging = async () => {
+    setDetailLoggingLoading(true);
+    try {
+      const nextEnabled = !detailLoggingEnabled;
+      const res = await fetch("/api/logs/detail", {
+        method: "POST",
+        headers: { "Content-Type": "application/json" },
+        body: JSON.stringify({ enabled: nextEnabled }),
+      });
+      if (!res.ok) throw new Error("Failed to update detailed logging");
+      setDetailLoggingEnabled(nextEnabled);
+      setDetailLoggingReady(true);
+    } catch (error) {
+      console.error("Failed to toggle detailed logging:", error);
+    } finally {
+      setDetailLoggingLoading(false);
+    }
+  };
+
  // Unique accounts and providers for dropdowns

  const uniqueAccounts = [...new Set(logs.map((l) => l.account).filter((a) => a && a !== "-"))];
@@ -271,6 +307,33 @@ export default function RequestLoggerV2() {
          {recording ? "Recording" : "Paused"}
        </button>

+        <button
+          onClick={toggleDetailLogging}
+          disabled={detailLoggingLoading}
+          className={`flex items-center gap-2 px-3 py-1.5 rounded-full text-sm font-medium border transition-colors disabled:opacity-60 ${
+            detailLoggingEnabled
+              ? "bg-amber-500/10 border-amber-500/30 text-amber-700 dark:text-amber-300"
+              : "bg-bg-subtle border-border text-text-muted"
+          }`}
+          title="Capture four-stage pipeline payloads for new requests"
+        >
+          <span
+            className={`w-2 h-2 rounded-full ${detailLoggingEnabled ? "bg-amber-500" : "bg-text-muted"}`}
+          />
+          {detailLoggingLoading
+            ? "Updating detailed logs..."
+            : detailLoggingEnabled
+              ? "Detailed Logs On"
+              : "Detailed Logs Off"}
+        </button>
+
+        {detailLoggingReady && (
+          <span className="text-[11px] text-text-muted">
+            New requests will {detailLoggingEnabled ? "" : "not "}capture client/provider pipeline
+            payloads.
+          </span>
+        )}
+
        {/* Search */}
        <div className="flex-1 min-w-[200px] relative">
          <span className="material-symbols-outlined absolute left-3 top-1/2 -translate-y-1/2 text-text-muted text-[18px]">
@@ -21,6 +21,7 @@ const navItemDefs = [
  { href: "/dashboard/costs", i18nKey: "costs", icon: "account_balance_wallet" },
  { href: "/dashboard/analytics", i18nKey: "analytics", icon: "analytics" },
  { href: "/dashboard/limits", i18nKey: "limits", icon: "tune" },
+  { href: "/dashboard/cache", i18nKey: "cache", icon: "cached" },
 ];

 const cliItemDefs = [
@@ -661,6 +661,7 @@ export const USAGE_SUPPORTED_PROVIDERS = [
  "codex",
  "claude",
  "kimi-coding",
+  "glm",
 ];

 // ── Zod validation at module load (Phase 7.2) ──
@@ -8,13 +8,53 @@
 * Pattern mirrors cloudSyncScheduler.ts for consistency.
 */

+import { randomUUID } from "node:crypto";
 import { getSettings, updateSettings } from "@/lib/localDb";
+import { getRuntimePorts } from "@/lib/runtime/ports";

 const DEFAULT_INTERVAL_MS = 24 * 60 * 60 * 1000; // 24 hours
 const MODEL_SYNC_SETTING_KEY = "model_sync_last_run";
+const MODEL_SYNC_INTERNAL_AUTH_HEADER = "x-model-sync-internal-auth";
+
+const { dashboardPort } = getRuntimePorts();
+
+const INTERNAL_BASE_URL =
+  process.env.BASE_URL ||
+  process.env.NEXT_PUBLIC_BASE_URL ||
+  process.env.NEXT_PUBLIC_APP_URL ||
+  `http://localhost:${dashboardPort}`;
+
+const globalState = globalThis as typeof globalThis & {
+  __omnirouteModelSyncInternalAuthToken?: string;
+};

 let schedulerTimer: NodeJS.Timeout | null = null;
 let isRunning = false;
+let internalAuthToken: string | null = null;
+
+function getInternalAuthToken(): string {
+  if (!internalAuthToken) {
+    internalAuthToken = globalState.__omnirouteModelSyncInternalAuthToken || randomUUID();
+    globalState.__omnirouteModelSyncInternalAuthToken = internalAuthToken;
+  }
+  return internalAuthToken;
+}
+
+export function getModelSyncInternalAuthHeaderName(): string {
+  return MODEL_SYNC_INTERNAL_AUTH_HEADER;
+}
+
+export function buildModelSyncInternalHeaders(): Record<string, string> {
+  return { [MODEL_SYNC_INTERNAL_AUTH_HEADER]: getInternalAuthToken() };
+}
+
+export function isModelSyncInternalRequest(request: Request): boolean {
+  if (!internalAuthToken && globalState.__omnirouteModelSyncInternalAuthToken) {
+    internalAuthToken = globalState.__omnirouteModelSyncInternalAuthToken;
+  }
+  const headerToken = request.headers.get(MODEL_SYNC_INTERNAL_AUTH_HEADER);
+  return Boolean(headerToken && internalAuthToken && headerToken === internalAuthToken);
+}

 /**
 * Fetch all provider connections that have autoSync enabled.
@@ -50,7 +90,10 @@ async function syncConnectionModels(
  try {
    const res = await fetch(`${baseUrl}/api/providers/${connectionId}/sync-models`, {
      method: "POST",
-      headers: { "Content-Type": "application/json", "x-internal": "model-sync-scheduler" },
+      headers: {
+        "Content-Type": "application/json",
+        ...buildModelSyncInternalHeaders(),
+      },
    });
    if (!res.ok) {
      console.warn(
@@ -121,7 +164,7 @@ async function runSyncCycle(apiBaseUrl: string): Promise<void> {
 * @param intervalMs — sync interval in milliseconds (default: 24h)
 */
 export function startModelSyncScheduler(
-  apiBaseUrl = "http://localhost:20128",
+  apiBaseUrl = INTERNAL_BASE_URL,
  intervalMs = DEFAULT_INTERVAL_MS
 ): void {
  if (schedulerTimer) {
@@ -44,6 +44,7 @@ import { RequestTelemetry, recordTelemetry } from "../../shared/utils/requestTel
 import { generateRequestId } from "../../shared/utils/requestId";
 import { logAuditEvent } from "../../lib/compliance/index";
 import { enforceApiKeyPolicy } from "../../shared/utils/apiKeyPolicy";
+import { cloneLogPayload } from "@/lib/logPayloads";
 import {
  applyTaskAwareRouting,
  getTaskRoutingConfig,
@@ -81,6 +82,13 @@ export async function handleChat(request: any, clientRawRequest: any = null) {
    return errorResponse(HTTP_STATUS.BAD_REQUEST, "Invalid JSON body");
  }

+  const rawClientBody = cloneLogPayload(body);
+
+  // Build clientRawRequest for logging (if not provided)
+  if (!clientRawRequest) {
+    clientRawRequest = buildClientRawRequest(request, rawClientBody);
+  }
+
  // FASE-01: Input sanitization — prompt injection detection & PII redaction
  telemetry.startPhase("validate");
  const sanitizeResult = sanitizeRequest(body, log as any);
@@ -113,16 +121,6 @@ export async function handleChat(request: any, clientRawRequest: any = null) {
    );
  }

-  // Build clientRawRequest for logging (if not provided)
-  if (!clientRawRequest) {
-    const url = new URL(request.url);
-    clientRawRequest = {
-      endpoint: url.pathname,
-      body,
-      headers: Object.fromEntries(request.headers.entries()),
-    };
-  }
-
  // Log request endpoint and model
  const url = new URL(request.url);
  const modelStr = body.model;
@@ -284,6 +282,45 @@ export async function handleChat(request: any, clientRawRequest: any = null) {
      allCombos,
    });

+    // ── Global Fallback Provider (#689) ────────────────────────────────────
+    // If combo exhausted all models, try the global fallback before giving up.
+    if (
+      !response.ok &&
+      [502, 503].includes(response.status) &&
+      typeof (settings as any)?.globalFallbackModel === "string" &&
+      (settings as any).globalFallbackModel.trim()
+    ) {
+      const fallbackModel = (settings as any).globalFallbackModel.trim();
+      log.info(
+        "GLOBAL_FALLBACK",
+        `Combo "${combo.name}" exhausted — attempting global fallback: ${fallbackModel}`
+      );
+      try {
+        const fallbackResponse = await handleSingleModelChat(
+          body,
+          fallbackModel,
+          clientRawRequest,
+          request,
+          combo.name,
+          apiKeyInfo,
+          telemetry,
+          { sessionId, emergencyFallbackTried: true }
+        );
+        if (fallbackResponse.ok) {
+          log.info("GLOBAL_FALLBACK", `Global fallback ${fallbackModel} succeeded`);
+          recordTelemetry(telemetry);
+          return withSessionHeader(fallbackResponse, sessionId);
+        }
+        log.warn(
+          "GLOBAL_FALLBACK",
+          `Global fallback ${fallbackModel} also failed (${fallbackResponse.status})`
+        );
+      } catch (err: any) {
+        log.warn("GLOBAL_FALLBACK", `Global fallback error: ${err?.message || "unknown"}`);
+      }
+    }
+    // ─────────────────────────────────────────────────────────────────────────
+
    // Record telemetry
    recordTelemetry(telemetry);
    return withSessionHeader(response, sessionId);
@@ -305,6 +342,15 @@ export async function handleChat(request: any, clientRawRequest: any = null) {
  return withSessionHeader(response, sessionId);
 }

+export function buildClientRawRequest(request: Request, body: unknown) {
+  const url = new URL(request.url);
+  return {
+    endpoint: url.pathname,
+    body: cloneLogPayload(body),
+    headers: Object.fromEntries(request.headers.entries()),
+  };
+}
+
 /**
 * Handle single model chat request
 *
@@ -59,6 +59,7 @@ describe("Pipeline Wiring — server-init.ts", () => {

 describe("Pipeline Wiring — sse chat handler", () => {
  const src = readProjectFile("src/sse/handlers/chat.ts");
+  const coreSrc = readProjectFile("open-sse/handlers/chatCore.ts");

  it("should import and use request sanitization", () => {
    assert.ok(src, "src/sse/handlers/chat.ts should exist");
@@ -81,8 +82,10 @@ describe("Pipeline Wiring — sse chat handler", () => {
    assert.match(src, /generateRequestId/);
  });

-  it("should import cost tracking integration", () => {
-    assert.match(src, /recordCost/);
+  it("should keep cost tracking integration in the chat pipeline", () => {
+    assert.ok(coreSrc, "open-sse/handlers/chatCore.ts should exist");
+    assert.match(coreSrc, /calculateCost/);
+    assert.match(coreSrc, /recordCost/);
  });
 });

@@ -23,12 +23,19 @@ function readSrc(relPath) {
  return readFileSync(full, "utf8");
 }

+function readOpenSse(relPath) {
+  const full = join(ROOT, "open-sse", relPath);
+  if (!existsSync(full)) return null;
+  return readFileSync(full, "utf8");
+}
+
 // ═══════════════════════════════════════════════════
 // 1. Chat Handler Pipeline Wiring
 // ═══════════════════════════════════════════════════

 describe("Chat Pipeline — handleSingleModelChat decomposition", () => {
  const src = readSrc("sse/handlers/chat.ts");
+  const coreSrc = readOpenSse("handlers/chatCore.ts");

  it("should define resolveModelOrError helper", () => {
    assert.ok(src, "chat.ts should exist");
@@ -43,8 +50,10 @@ describe("Chat Pipeline — handleSingleModelChat decomposition", () => {
    assert.match(src, /function\s+executeChatWithBreaker/);
  });

-  it("should define recordCostIfNeeded helper", () => {
-    assert.match(src, /function\s+recordCostIfNeeded/);
+  it("should keep cost accounting in the core chat pipeline", () => {
+    assert.ok(coreSrc, "open-sse/handlers/chatCore.ts should exist");
+    assert.match(coreSrc, /calculateCost\(/);
+    assert.match(coreSrc, /recordCost\(/);
  });

  it("handleSingleModelChat should use resolveModelOrError", () => {
@@ -60,8 +69,9 @@ describe("Chat Pipeline — handleSingleModelChat decomposition", () => {
    assert.match(src, /executeChatWithBreaker\(/);
  });

-  it("handleSingleModelChat should use recordCostIfNeeded", () => {
-    assert.match(src, /recordCostIfNeeded\(/);
+  it("chatCore should record cost for both non-streaming and streaming responses", () => {
+    assert.match(coreSrc, /if \(apiKeyInfo\?\.id && usage\)/);
+    assert.match(coreSrc, /if \(apiKeyInfo\?\.id && streamUsage\)/);
  });
 });

@@ -161,7 +161,17 @@ test("callLogs.ts wires no-log and PII sanitization before persistence", () => {
  );
  assert.ok(content.includes('from "../piiSanitizer"'), "callLogs.ts should import piiSanitizer");
  assert.ok(content.includes("isNoLog("), "callLogs.ts should check no-log policy");
-  assert.ok(content.includes("sanitizePayloadPII"), "callLogs.ts should sanitize PII recursively");
+
+  const payloadHelperContent = readIfExists("src/lib/logPayloads.ts");
+  assert.ok(payloadHelperContent, "src/lib/logPayloads.ts should exist");
+  assert.ok(
+    content.includes("protectPayloadForLog") && content.includes('from "../logPayloads"'),
+    "callLogs.ts should route payload protection through shared log helpers"
+  );
+  assert.ok(
+    payloadHelperContent.includes("export function sanitizePayloadPII"),
+    "logPayloads.ts should keep recursive PII sanitization logic"
+  );
 });

 test("API key update route and DB layer wire persisted no-log controls", () => {
@@ -0,0 +1,51 @@
+import test from "node:test";
+import assert from "node:assert/strict";
+
+const { buildComboTestRequestBody, shouldProbeComboTestReachability, probeComboModelReachability } =
+  await import("../../src/lib/combos/testHealth.ts");
+
+test("combo test helper builds a realistic smoke payload", () => {
+  const body = buildComboTestRequestBody("openrouter/openai/gpt-5.4");
+
+  assert.equal(body.model, "openrouter/openai/gpt-5.4");
+  assert.equal(body.messages[0].content, "Reply with OK only.");
+  assert.equal(body.max_tokens, 16);
+  assert.equal(body.stream, false);
+});
+
+test("combo test helper probes only soft 4xx responses", () => {
+  assert.equal(shouldProbeComboTestReachability(400), true);
+  assert.equal(shouldProbeComboTestReachability(422), true);
+  assert.equal(shouldProbeComboTestReachability(401), false);
+  assert.equal(shouldProbeComboTestReachability(404), false);
+  assert.equal(shouldProbeComboTestReachability(429), false);
+});
+
+test("combo reachability probe reuses resolved provider credentials and model id", async () => {
+  let validationInput = null;
+
+  const result = await probeComboModelReachability("openrouter/openai/gpt-5.4", {
+    getModelInfo: async () => ({ provider: "openrouter", model: "openai/gpt-5.4" }),
+    getProviderCredentials: async () => ({
+      apiKey: "test-key",
+      providerSpecificData: { baseUrl: "https://openrouter.ai/api/v1" },
+    }),
+    validateProviderApiKey: async (input) => {
+      validationInput = input;
+      return { valid: true, method: "models_endpoint" };
+    },
+  });
+
+  assert.equal(result.reachable, true);
+  assert.equal(result.provider, "openrouter");
+  assert.equal(result.model, "openai/gpt-5.4");
+  assert.equal(result.method, "models_endpoint");
+  assert.deepEqual(validationInput, {
+    provider: "openrouter",
+    apiKey: "test-key",
+    providerSpecificData: {
+      baseUrl: "https://openrouter.ai/api/v1",
+      validationModelId: "openai/gpt-5.4",
+    },
+  });
+});
@@ -0,0 +1,132 @@
+import { describe, test } from "node:test";
+import assert from "node:assert/strict";
+import {
+  injectModelTag,
+  extractPinnedModel,
+} from "../../open-sse/services/comboAgentMiddleware.ts";
+
+describe("Context pinning — tool call responses (#721)", () => {
+  test("injectModelTag appends synthetic tag when last assistant has null content (tool_calls)", () => {
+    const messages = [
+      { role: "user", content: "List the files" },
+      {
+        role: "assistant",
+        content: null,
+        tool_calls: [
+          {
+            id: "call_abc123",
+            type: "function",
+            function: { name: "read", arguments: '{"filePath":"/mnt/e/deer-flow"}' },
+          },
+        ],
+      },
+    ];
+
+    const result = injectModelTag(messages, "ollamacloud/glm-5");
+
+    // Should append a synthetic assistant message with the pin tag
+    assert.equal(result.length, 3, "Should have 3 messages (original 2 + synthetic)");
+    assert.equal(result[2].role, "assistant");
+    assert.ok(
+      result[2].content.includes("<omniModel>ollamacloud/glm-5</omniModel>"),
+      "Synthetic message should contain the pin tag"
+    );
+  });
+
+  test("injectModelTag appends synthetic tag when last assistant has array content", () => {
+    const messages = [
+      { role: "user", content: "Explain the code" },
+      {
+        role: "assistant",
+        content: [
+          { type: "text", text: "Here is the analysis" },
+          { type: "text", text: "And here is part 2" },
+        ],
+      },
+    ];
+
+    const result = injectModelTag(messages, "nvidia/llama-3.4-70b");
+
+    // Array content → should append synthetic message
+    assert.equal(result.length, 3);
+    assert.equal(result[2].role, "assistant");
+    assert.ok(result[2].content.includes("<omniModel>nvidia/llama-3.4-70b</omniModel>"));
+  });
+
+  test("extractPinnedModel finds tag in synthetic message after tool_calls", () => {
+    const messages = [
+      { role: "user", content: "List the files" },
+      {
+        role: "assistant",
+        content: null,
+        tool_calls: [
+          { id: "call_abc", type: "function", function: { name: "read", arguments: "{}" } },
+        ],
+      },
+      { role: "assistant", content: "\n<omniModel>ollamacloud/glm-5</omniModel>" },
+    ];
+
+    const pinned = extractPinnedModel(messages);
+    assert.equal(pinned, "ollamacloud/glm-5");
+  });
+
+  test("injectModelTag still works for normal string content", () => {
+    const messages = [
+      { role: "user", content: "Hello" },
+      { role: "assistant", content: "Hi there!" },
+    ];
+
+    const result = injectModelTag(messages, "openai/gpt-4o");
+
+    assert.equal(result.length, 2, "Should not add a new message");
+    assert.ok(result[1].content.includes("<omniModel>openai/gpt-4o</omniModel>"));
+    assert.ok(result[1].content.startsWith("Hi there!"));
+  });
+
+  test("roundtrip: inject → extract works for tool-call messages", () => {
+    const messages = [
+      { role: "user", content: "List the files" },
+      {
+        role: "assistant",
+        content: null,
+        tool_calls: [
+          {
+            id: "call_abc123",
+            type: "function",
+            function: { name: "read", arguments: '{"filePath":"/home"}' },
+          },
+        ],
+      },
+    ];
+
+    const tagged = injectModelTag(messages, "qwen/coder-model");
+    const pinned = extractPinnedModel(tagged);
+
+    assert.equal(pinned, "qwen/coder-model", "Should roundtrip the pinned model");
+  });
+
+  test("re-injection clears old pin and sets new one", () => {
+    const messages = [
+      { role: "user", content: "Follow up" },
+      { role: "assistant", content: "Previous answer\n<omniModel>old/model</omniModel>" },
+      { role: "user", content: "Continue" },
+      {
+        role: "assistant",
+        content: null,
+        tool_calls: [
+          { id: "call_xyz", type: "function", function: { name: "exec", arguments: "{}" } },
+        ],
+      },
+    ];
+
+    const tagged = injectModelTag(messages, "new/model");
+    const pinned = extractPinnedModel(tagged);
+
+    assert.equal(pinned, "new/model", "Should return new pinned model, not old one");
+    // Verify old tag was cleaned
+    const oldTagPresent = tagged.some(
+      (m) => typeof m.content === "string" && m.content.includes("old/model")
+    );
+    assert.equal(oldTagPresent, false, "Old pin tag should be cleaned");
+  });
+});
@@ -0,0 +1,39 @@
+import test from "node:test";
+import assert from "node:assert/strict";
+import fs from "node:fs";
+import path from "node:path";
+
+test("modelSyncScheduler: internal auth headers validate only for scheduler requests", async () => {
+  const {
+    buildModelSyncInternalHeaders,
+    getModelSyncInternalAuthHeaderName,
+    isModelSyncInternalRequest,
+  } = await import("../../src/shared/services/modelSyncScheduler.ts");
+
+  const internalRequest = new Request("http://localhost/api/providers/test/sync-models", {
+    method: "POST",
+    headers: buildModelSyncInternalHeaders(),
+  });
+  assert.equal(isModelSyncInternalRequest(internalRequest), true);
+
+  const externalRequest = new Request("http://localhost/api/providers/test/sync-models", {
+    method: "POST",
+    headers: { [getModelSyncInternalAuthHeaderName()]: "invalid-token" },
+  });
+  assert.equal(isModelSyncInternalRequest(externalRequest), false);
+});
+
+test("initCloudSync: startup initialization also starts model sync scheduler", () => {
+  const filePath = path.join(process.cwd(), "src/lib/initCloudSync.ts");
+  const source = fs.readFileSync(filePath, "utf8");
+
+  assert.match(source, /startModelSyncScheduler\s*\(/);
+});
+
+test("proxy: internal model sync token is only allowed for provider model sync routes", () => {
+  const filePath = path.join(process.cwd(), "src/proxy.ts");
+  const source = fs.readFileSync(filePath, "utf8");
+
+  assert.match(source, /isModelSyncInternalRequest/);
+  assert.match(source, /sync-models\|models/);
+});
@@ -0,0 +1,46 @@
+import test from "node:test";
+import assert from "node:assert/strict";
+
+const providerLimitUtils =
+  await import("../../src/app/(dashboard)/dashboard/usage/components/ProviderLimits/utils.tsx");
+
+test("provider plan fallbacks normalize to Unknown instead of repeating provider labels", () => {
+  const tier = providerLimitUtils.normalizePlanTier("Claude Code");
+
+  assert.equal(tier.key, "unknown");
+  assert.equal(tier.label, "Unknown");
+});
+
+test("paid individual tiers use non-gray badge variants", () => {
+  assert.equal(providerLimitUtils.normalizePlanTier("Plus").variant, "success");
+  assert.equal(providerLimitUtils.normalizePlanTier("Pro").variant, "success");
+  assert.equal(providerLimitUtils.normalizePlanTier("Student").variant, "success");
+  assert.equal(providerLimitUtils.normalizePlanTier("Free").variant, "default");
+});
+
+test("Codex workspacePlanType is used when live plan is missing or unknown", () => {
+  const resolvedPlan = providerLimitUtils.resolvePlanValue("unknown", {
+    workspacePlanType: "plus",
+  });
+
+  assert.equal(resolvedPlan, "plus");
+  const tier = providerLimitUtils.normalizePlanTier(resolvedPlan);
+  assert.equal(tier.key, "plus");
+  assert.equal(tier.variant, "success");
+});
+
+test("remaining percentage helpers reflect remaining quota and stale resets refill to 100", () => {
+  assert.equal(providerLimitUtils.calculatePercentage(0, 100), 100);
+  assert.equal(providerLimitUtils.calculatePercentage(17, 100), 83);
+  assert.equal(providerLimitUtils.calculatePercentage(60, 100), 40);
+
+  const past = new Date(Date.now() - 60_000).toISOString();
+  const parsed = providerLimitUtils.parseQuotaData("codex", {
+    quotas: {
+      session: { used: 83, total: 100, resetAt: past },
+    },
+  });
+
+  assert.equal(parsed.length, 1);
+  assert.equal(providerLimitUtils.calculatePercentage(parsed[0].used, parsed[0].total), 100);
+});
@@ -0,0 +1,65 @@
+import test from "node:test";
+import assert from "node:assert/strict";
+
+const {
+  normalizePayloadForLog,
+  protectPayloadForLog,
+  serializePayloadForStorage,
+  parseStoredPayload,
+} = await import("../../src/lib/logPayloads.ts");
+const { createStructuredSSECollector } =
+  await import("../../open-sse/utils/streamPayloadCollector.ts");
+
+test("normalizes JSON strings before log protection and redacts sensitive keys", () => {
+  const protectedPayload = protectPayloadForLog(
+    JSON.stringify({
+      authorization: "Bearer secret-token-value",
+      nested: {
+        apiKey: "top-secret-key",
+      },
+    })
+  );
+
+  assert.deepEqual(protectedPayload, {
+    authorization: "[REDACTED]",
+    nested: {
+      apiKey: "[REDACTED]",
+    },
+  });
+});
+
+test("wraps raw text payloads in JSON-safe objects", () => {
+  const normalized = normalizePayloadForLog("event: ping\ndata: plain-text\n\n");
+
+  assert.deepEqual(normalized, {
+    _rawText: "event: ping\ndata: plain-text\n\n",
+  });
+});
+
+test("serializes truncated payloads as valid JSON objects", () => {
+  const stored = serializePayloadForStorage({ text: "x".repeat(200) }, 80);
+  const parsed = parseStoredPayload(stored);
+
+  assert.equal(parsed._truncated, true);
+  assert.equal(parsed._originalSize > 80, true);
+  assert.equal(typeof parsed._preview, "string");
+});
+
+test("structured SSE collector preserves event order and marks truncation", () => {
+  const collector = createStructuredSSECollector({ maxEvents: 2, maxBytes: 200 });
+
+  collector.push({ type: "response.created", id: "r1" });
+  collector.push({ type: "response.output_text.delta", delta: "hi" });
+  collector.push({ type: "response.completed" });
+
+  const payload = collector.build({ done: true });
+
+  assert.equal(payload._streamed, true);
+  assert.equal(payload._eventCount, 3);
+  assert.equal(payload._truncated, true);
+  assert.equal(payload._droppedEvents, 1);
+  assert.equal(payload.events.length, 2);
+  assert.equal(payload.events[0].event, "response.created");
+  assert.equal(payload.events[1].event, "response.output_text.delta");
+  assert.deepEqual(payload.summary, { done: true });
+});
@@ -2,7 +2,8 @@ import test from "node:test";
 import assert from "node:assert/strict";

 const { checkFallbackError } = await import("../../open-sse/services/accountFallback.ts");
-const { handleComboChat } = await import("../../open-sse/services/combo.ts");
+const { handleComboChat, shouldFallbackComboBadRequest } =
+  await import("../../open-sse/services/combo.ts");
 const { resetAllCircuitBreakers } = await import("../../src/shared/utils/circuitBreaker.ts");

 test.beforeEach(() => {
@@ -139,3 +140,43 @@ test("T24: all inactive accounts return 503 service_unavailable (not 406)", asyn
  const body = await result.json();
  assert.equal(body.error?.code, "ALL_ACCOUNTS_INACTIVE");
 });
+
+test("combo falls through provider-scoped 400s and reaches the next model", async () => {
+  const log = createLog();
+
+  const result = await handleComboChat({
+    body: {},
+    combo: {
+      name: "t24-provider-scoped-400",
+      strategy: "priority",
+      models: [
+        { model: "free/gemini-3.1-pro-preview", weight: 0 },
+        { model: "aio/gemini-3.1-pro-preview-thinking-high", weight: 0 },
+        { model: "openrouter/google/gemini-3.1-pro-preview", weight: 0 },
+      ],
+    },
+    handleSingleModel: createStatusSequenceHandler([
+      { status: 429, message: "No capacity available for model gemini-3.1-pro-preview" },
+      { status: 400, message: "request blocked by Gemini API: PROHIBITED_CONTENT" },
+      { status: 200 },
+    ]),
+    isModelAvailable: () => true,
+    log,
+    settings: null,
+    allCombos: null,
+  });
+
+  assert.equal(result.ok, true);
+  const badRequestLog = log.entries.find((entry) => entry.msg.includes("provider-scoped 400"));
+  assert.ok(badRequestLog);
+});
+
+test("combo bad-request fallback helper keeps generic 400s terminal", () => {
+  assert.equal(shouldFallbackComboBadRequest(400, "request blocked by Gemini API"), true);
+  assert.equal(
+    shouldFallbackComboBadRequest(400, "One or more of the provided message roles is not valid"),
+    true
+  );
+  assert.equal(shouldFallbackComboBadRequest(400, "bad request"), false);
+  assert.equal(shouldFallbackComboBadRequest(422, "request blocked by Gemini API"), false);
+});
@@ -0,0 +1,161 @@
+/**
+ * T43: Gemini tool call parts must NOT include thoughtSignature.
+ *
+ * Regression test for HTTP 400 "invalid argument" when OmniRoute translates
+ * OpenAI tool_calls to Gemini format. The thoughtSignature field is only valid
+ * on thinking/reasoning parts — injecting it on functionCall parts causes the
+ * Gemini API to reject the request with a 400 error.
+ *
+ * Reproduces: https://github.com/diegosouzapw/OmniRoute/issues/725
+ */
+
+import test from "node:test";
+import assert from "node:assert/strict";
+
+const { translateRequest } = await import("../../open-sse/translator/index.ts");
+const { FORMATS } = await import("../../open-sse/translator/formats.ts");
+
+function translateToGemini(messages, tools) {
+  return translateRequest(FORMATS.OPENAI, FORMATS.GEMINI, "gemini-2.0-flash", {
+    model: "gemini-2.0-flash",
+    messages,
+    tools,
+    stream: false,
+  });
+}
+
+test("T43: functionCall parts must not contain thoughtSignature", () => {
+  const messages = [
+    { role: "user", content: "What is the weather in Tokyo?" },
+    {
+      role: "assistant",
+      content: null,
+      tool_calls: [
+        {
+          id: "call_abc123",
+          type: "function",
+          function: { name: "get_weather", arguments: '{"location":"Tokyo"}' },
+        },
+      ],
+    },
+    {
+      role: "tool",
+      tool_call_id: "call_abc123",
+      content: '{"temp": "15°C", "condition": "cloudy"}',
+    },
+  ];
+
+  const tools = [
+    {
+      type: "function",
+      function: {
+        name: "get_weather",
+        description: "Get weather for a location",
+        parameters: {
+          type: "object",
+          properties: { location: { type: "string" } },
+          required: ["location"],
+        },
+      },
+    },
+  ];
+
+  const result = translateToGemini(messages, tools);
+
+  // Find the model turn that contains the functionCall
+  const modelTurn = result.contents.find(
+    (c) => c.role === "model" && c.parts?.some((p) => p.functionCall)
+  );
+
+  assert.ok(modelTurn, "Expected a model turn with functionCall parts");
+
+  for (const part of modelTurn.parts) {
+    if (part.functionCall) {
+      assert.ok(
+        !("thoughtSignature" in part),
+        `functionCall part must not contain thoughtSignature — Gemini API returns HTTP 400 "invalid argument" when it does. Got: ${JSON.stringify(part)}`
+      );
+      assert.equal(part.functionCall.name, "get_weather");
+      assert.deepEqual(part.functionCall.args, { location: "Tokyo" });
+    }
+  }
+});
+
+test("T43: multiple tool calls — none of the functionCall parts may have thoughtSignature", () => {
+  const messages = [
+    { role: "user", content: "Get weather for Tokyo and London" },
+    {
+      role: "assistant",
+      content: null,
+      tool_calls: [
+        {
+          id: "call_001",
+          type: "function",
+          function: { name: "get_weather", arguments: '{"location":"Tokyo"}' },
+        },
+        {
+          id: "call_002",
+          type: "function",
+          function: { name: "get_weather", arguments: '{"location":"London"}' },
+        },
+      ],
+    },
+    {
+      role: "tool",
+      tool_call_id: "call_001",
+      content: '{"temp":"15°C"}',
+    },
+    {
+      role: "tool",
+      tool_call_id: "call_002",
+      content: '{"temp":"10°C"}',
+    },
+  ];
+
+  const result = translateToGemini(messages, []);
+
+  const modelTurn = result.contents.find(
+    (c) => c.role === "model" && c.parts?.some((p) => p.functionCall)
+  );
+  assert.ok(modelTurn, "Expected a model turn with functionCall parts");
+
+  const functionCallParts = modelTurn.parts.filter((p) => p.functionCall);
+  assert.equal(functionCallParts.length, 2, "Expected 2 functionCall parts");
+
+  for (const part of functionCallParts) {
+    assert.ok(
+      !("thoughtSignature" in part),
+      `functionCall part must not contain thoughtSignature. Got: ${JSON.stringify(part)}`
+    );
+  }
+});
+
+test("T43: thinking parts still include thoughtSignature (regression guard)", () => {
+  // Ensure we did not accidentally break the thinking parts that legitimately
+  // need thoughtSignature (present when msg.reasoning_content is set).
+  const messages = [
+    { role: "user", content: "Think about the weather" },
+    {
+      role: "assistant",
+      reasoning_content: "The user wants weather data.",
+      content: "I'll check the weather.",
+      tool_calls: undefined,
+    },
+  ];
+
+  const result = translateToGemini(messages, []);
+
+  const modelTurn = result.contents.find((c) => c.role === "model");
+  assert.ok(modelTurn, "Expected a model turn");
+
+  const thinkingPart = modelTurn.parts.find((p) => p.thought === true);
+  assert.ok(thinkingPart, "Expected a thinking part when reasoning_content is set");
+  assert.equal(thinkingPart.text, "The user wants weather data.");
+
+  const signaturePart = modelTurn.parts.find((p) => "thoughtSignature" in p);
+  assert.ok(signaturePart, "Expected a thoughtSignature part after thinking part");
+  assert.ok(
+    !signaturePart.functionCall,
+    "thoughtSignature part must not also be a functionCall part"
+  );
+});
Author	SHA1	Message	Date
diegosouzapw	fb8d187f8d	chore(release): v3.2.2 — Four-Stage Request Logs & Bugfixes Build Electron Desktop App / Validate version (push) Failing after 35s Details Build Electron Desktop App / Build Electron (macos-arm64) (push) Has been skipped Details Build Electron Desktop App / Build Electron (linux) (push) Has been skipped Details Build Electron Desktop App / Build Electron (macos-intel) (push) Has been skipped Details Build Electron Desktop App / Build Electron (windows) (push) Has been skipped Details Build Electron Desktop App / Create Release (push) Has been skipped Details Build Electron Desktop App / Publish to npm (push) Has been skipped Details	2026-03-28 22:11:22 -03:00
diegosouzapw	1a11301e1a	Merge branch 'codex/request-log-pipeline-json'	2026-03-28 22:09:34 -03:00
R.D.	4c6cdd5c23	test: align pipeline integration assertions	2026-03-28 22:09:27 -03:00
R.D.	30a64b0dd3	test: align security hardening log helper checks	2026-03-28 22:09:27 -03:00
R.D.	04de492019	fix: add four-stage request log payloads	2026-03-28 22:09:27 -03:00
R.D.	07890df6cb	test: align pipeline integration assertions	2026-03-28 22:07:20 -03:00
R.D.	2f23cfdf1c	test: align security hardening log helper checks	2026-03-28 22:07:20 -03:00
R.D.	1832946d41	fix: add four-stage request log payloads	2026-03-28 22:07:20 -03:00
Diego Souza	6ec8745d2e	ci: add GitHub Packages publish configuration for GHCR and NPM	2026-03-28 22:04:02 -03:00
diegosouzapw	b6bbfe063b	fix(sse): preserve cache_control in Claude passthrough mode (#708 )	2026-03-28 22:01:38 -03:00
oyi77	48182edbd5	fix(translator): remove thoughtSignature from functionCall parts in Gemini translation HTTP 400 "invalid argument" was triggered when OmniRoute translated OpenAI tool_calls to Gemini format, because thoughtSignature was injected onto every functionCall part unconditionally. thoughtSignature is only valid on thinking/reasoning parts (those with thought: true). The Gemini API rejects any request where a functionCall part carries a thoughtSignature field, returning HTTP 400. Fix: remove the thoughtSignature field from functionCall parts. The thinking parts that legitimately require thoughtSignature (emitted when a message has reasoning_content) are unchanged. Adds regression test (T43) with three cases: - single tool call: no thoughtSignature on functionCall part - multiple tool calls: none carry thoughtSignature - thinking part regression guard: thoughtSignature still present on thought parts Fixes #725	2026-03-28 21:57:15 -03:00
Diego Rodrigues de Sa e Souza	fc24361aa6	Merge pull request #726 from diegosouzapw/release/v3.2.1 Build Electron Desktop App / Validate version (push) Failing after 26s Details Build Electron Desktop App / Build Electron (macos-arm64) (push) Has been skipped Details Build Electron Desktop App / Build Electron (linux) (push) Has been skipped Details Build Electron Desktop App / Build Electron (macos-intel) (push) Has been skipped Details Build Electron Desktop App / Build Electron (windows) (push) Has been skipped Details Build Electron Desktop App / Create Release (push) Has been skipped Details Build Electron Desktop App / Publish to npm (push) Has been skipped Details chore(release): v3.2.1 — context pinning fix + global fallback	2026-03-28 21:19:24 -03:00
diegosouzapw	cec833afc6	chore(release): v3.2.1 — context pinning fix + global fallback provider	2026-03-28 21:13:14 -03:00
diegosouzapw	f1cddba938	feat: add global fallback provider support (#689 ) When all combo models are exhausted (502/503), OmniRoute now checks for a globalFallbackModel setting and attempts one last request through it before returning the error. Settings stored in key_value table, no migration needed.	2026-03-28 21:10:29 -03:00
diegosouzapw	a0acdfdcb9	fix: context pinning bypass during tool-call responses (#721 ) Non-streaming: Fixed json.messages check to use json.choices[0].message (OpenAI format). Streaming: inject pin tag before finish_reason chunk for tool-call-only streams. injectModelTag now appends synthetic assistant message when content is null/array (tool_calls).	2026-03-28 21:04:47 -03:00
Diego Rodrigues de Sa e Souza	6637f294df	chore: release v3.2.0 (#722 ) Build Electron Desktop App / Validate version (push) Failing after 28s Details Build Electron Desktop App / Build Electron (macos-arm64) (push) Has been skipped Details Build Electron Desktop App / Build Electron (linux) (push) Has been skipped Details Build Electron Desktop App / Build Electron (macos-intel) (push) Has been skipped Details Build Electron Desktop App / Build Electron (windows) (push) Has been skipped Details Build Electron Desktop App / Create Release (push) Has been skipped Details Build Electron Desktop App / Publish to npm (push) Has been skipped Details Co-authored-by: diegosouzapw <diegosouzapw@users.noreply.github.com>	2026-03-28 20:45:18 -03:00
dependabot[bot]	ad8a444105	deps: bump path-to-regexp from 8.3.0 to 8.4.0 (#715 ) Bumps [path-to-regexp](https://github.com/pillarjs/path-to-regexp) from 8.3.0 to 8.4.0. - [Release notes](https://github.com/pillarjs/path-to-regexp/releases) - [Changelog](https://github.com/pillarjs/path-to-regexp/blob/master/History.md) - [Commits](https://github.com/pillarjs/path-to-regexp/compare/v8.3.0...v8.4.0) --- updated-dependencies: - dependency-name: path-to-regexp dependency-version: 8.4.0 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-03-28 20:39:29 -03:00
Chris	877cfa0071	feat: add GLM Coding usage/quota tracking with Z.AI session quota (#698 ) * feat: add GLM Coding usage/quota tracking with Z.AI session quota Add GLM to the usage tracking pipeline: usage API route, Z.AI quota fetcher (TOKENS_LIMIT percentage-based), quota parser, and Provider Limits UI. Adds API region dropdown (International/China) to Add/Edit connection modals. Displays session quota with plan level. * fix: address PR review feedback for GLM usage tracking - Remove explicit `any` types from getGlmUsage (fix lint budget) - Fix empty string fallback for plan level - Remove duplicate `case "glm"` in quota parser (identical to default) - Skip OAuth refresh flow for GLM (API key auth) in usage route * fix: upgrade path-to-regexp to fix ReDoS vulnerability (GHSA-j3q9-mxjg-w52f, GHSA-27v5-c462-wpq7) --------- Co-authored-by: Chris Staley <christopher-s@users.noreply.github.com>	2026-03-28 20:39:24 -03:00
Paijo	e6f0a780b7	feat(dashboard): add Cache Management page with stats, hit rate, and targeted invalidation (#701 ) Adds a new /dashboard/cache page that surfaces the existing but UI-less semantic cache infrastructure. Changes: - New page: src/app/(dashboard)/dashboard/cache/page.tsx - Live stats: memory entries, DB entries, cache hits, tokens saved - Hit rate progress bar with color coding (green/yellow/red) - Hits/Misses/Total breakdown - Idempotency layer stats (active dedup keys + window) - Cache behavior info panel - Clear All button - Auto-refresh every 10s - Enhanced API: src/app/api/cache/route.ts - DELETE ?model=<name> — invalidate by model - DELETE ?signature=<hex> — invalidate single entry - DELETE ?staleMs=<ms> — invalidate entries older than N ms - DELETE (no params) — clear all (existing behavior) - Sidebar: added Cache nav item (icon: cached) - i18n: added cache + sidebar.cache keys for all 31 supported locales No new dependencies. All functionality builds on existing semanticCache.ts, cacheLayer.ts, and idempotencyLayer.ts modules. Co-authored-by: oyi77 <oyi77@users.noreply.github.com>	2026-03-28 20:39:20 -03:00
Randi	dd9de2efa9	fix: harden combo fallback and health checks (#704 )	2026-03-28 20:39:16 -03:00
Randi	f6b0811f78	[codex] fix provider limits ui (#718 ) * fix provider limits ui * restore remaining quota progress bars * address provider limits review feedback	2026-03-28 20:39:06 -03:00
Randi	eba9d854a9	fix model auto-sync startup and auth (#719 )	2026-03-28 20:39:02 -03:00