feat(release): v2.8.2 — model alias routing fix, log export, 2 merged PRs

feat(logs): add export button with time range dropdown (1h, 6h, 12h, 24h)
- New API: /api/logs/export?hours=24&type=call-logs - UI: Export button with dropdown on /dashboard/logs page - Supports export of request-logs, proxy-logs, and call-logs - Downloads as JSON file with Content-Disposition header
2026-03-19 11:13:49 -03:00 · 2026-03-19 11:11:07 -03:00 · 2026-03-19 11:07:29 -03:00 · 2026-03-19 11:04:30 -03:00 · 2026-03-19 11:02:13 -03:00 · 2026-03-19 08:45:54 -03:00
196 changed files with 16474 additions and 2232 deletions
@@ -32,6 +32,27 @@ Version format: `2.x.y` — examples:
 npm version patch --no-git-tag-version
 ```

+> **⚠️ ATOMIC COMMIT RULE — Version bump MUST happen before committing feature files.**
+>
+> **CORRECT order:**
+>
+> 1. `npm version patch --no-git-tag-version` ← bump first
+> 2. implement features / fix bugs
+> 3. `git add -A && git commit -m "chore(release): v2.x.y — all changes in ONE commit"`
+>
+> **OR if features are already staged:**
+>
+> 1. implement features (do NOT commit yet)
+> 2. `npm version patch --no-git-tag-version` ← bump before committing
+> 3. `git add -A && git commit -m "chore(release): v2.x.y — all changes in ONE commit"`
+>
+> **NEVER do this (creates version mismatch in git history):**
+>
+> - ~~commit features → then bump version → commit package.json separately~~
+>
+> This ensures that `git show v2.x.y` always contains both code changes and the version bump together.
+> The GitHub release tag will point to a commit that includes ALL changes for that version.
+
 ### 2. Regenerate lock file (REQUIRED after version bump)

 **Mandatory** — skipping causes `@swc/helpers` lock mismatch and CI failures:
@@ -55,6 +55,8 @@ logs/*
 # analysis directories (generated, not tracked)
 .analysis/
 antigravity-manager-analysis/
+.sisyphus/
+.plans/

 # docs (allow specific tracked files)
 docs/*
@@ -3,6 +3,11 @@ data/
 **/data/
 **/db.json

+# VS Code extension test runtime (large binary, not needed in npm package)
+app/vscode-extension/
+**/data/
+**/db.json
+
 # Source code (pre-built app/ is published instead)
 src/
 open-sse/
@@ -4,6 +4,332 @@

 ---

+## [2.8.2] — 2026-03-19
+
+> Sprint: 2 merged PRs, model aliases routing fix, log export, and issue triage.
+
+### Features
+
+- **Log Export**: New Export button on `/dashboard/logs` with time range dropdown (1h, 6h, 12h, 24h). Downloads JSON of request/proxy/call logs via `/api/logs/export` API (#user-request)
+
+### Bug Fixes
+
+- **Model Aliases Routing** (#472): Settings → Model Aliases now correctly affect provider routing, not just format detection. Previously `resolveModelAlias()` output was only used for `getModelTargetFormat()` but the original model ID was sent to the provider
+- **Stream Flush Usage** (#480): Usage data from the last SSE event in the buffer is now correctly extracted during stream flush (merged from @prakersh)
+
+### Merged PRs
+
+- #480 — Extract usage from remaining buffer in flush handler (@prakersh)
+- #479 — Add missing Codex 5.3/5.4 and Anthropic model ID pricing entries (@prakersh)
+
+---
+
+## [2.8.1] — 2026-03-19
+
+> Sprint: Five community PRs — streaming call log fixes, Kiro compatibility, cache token analytics, Chinese translation, and configurable tool call IDs.
+
+### ✨ Features
+
+- **feat(logs)**: Call log response content now correctly accumulated from raw provider chunks (OpenAI/Claude/Gemini) before translation, fixing empty response payloads in streaming mode (#470, @zhangqiang8vip)
+- **feat(providers)**: Per-model configurable 9-char tool call ID normalization (Mistral-style) — only models with the option enabled get truncated IDs (#470)
+- **feat(api)**: Key PATCH API expanded to support `allowedConnections`, `name`, `autoResolve`, `isActive`, and `accessSchedule` fields (#470)
+- **feat(dashboard)**: Response-first layout in request log detail UI (#470)
+- **feat(i18n)**: Improved Chinese (zh-CN) translation — complete retranslation (#475, @only4copilot)
+
+### 🐛 Bug Fixes
+
+- **fix(kiro)**: Strip injected `model` field from request body — Kiro API rejects unknown top-level fields (#478, @prakersh)
+- **fix(usage)**: Include cache read + cache creation tokens in usage history input totals for accurate analytics (#477, @prakersh)
+- **fix(callLogs)**: Support Claude format usage fields (`input_tokens`/`output_tokens`) alongside OpenAI format, include all cache token variants (#476, @prakersh)
+
+---
+
+## [2.8.0] — 2026-03-19
+
+> Sprint: Bailian Coding Plan provider with editable base URLs, plus community contributions for Alibaba Cloud and Kimi Coding.
+
+### ✨ Features
+
+- **feat(providers)**: Added Bailian Coding Plan (`bailian-coding-plan`) — Alibaba Model Studio with Anthropic-compatible API. Static catalog of 8 models including Qwen3.5 Plus, Qwen3 Coder, MiniMax M2.5, GLM 5, and Kimi K2.5. Includes custom auth validation (400=valid, 401/403=invalid) (#467, @Mind-Dragon)
+- **feat(admin)**: Editable default URL in Provider Admin create/edit flows — users can configure custom base URLs per connection. Persisted in `providerSpecificData.baseUrl` with Zod schema validation rejecting non-http(s) schemes (#467)
+
+### 🧪 Tests
+
+- Added 30+ unit tests and 2 e2e scenarios for Bailian Coding Plan provider covering auth validation, schema hardening, route-level behavior, and cross-layer integration
+
+---
+
+## [2.7.10] — 2026-03-19
+
+> Sprint: Two new community-contributed providers (Alibaba Cloud Coding, Kimi Coding API-key) and Docker pino fix.
+
+### ✨ Features
+
+- **feat(providers)**: Added Alibaba Cloud Coding Plan support with two OpenAI-compatible endpoints — `alicode` (China) and `alicode-intl` (International), each with 8 models (#465, @dtk1985)
+- **feat(providers)**: Added dedicated `kimi-coding-apikey` provider path — API-key-based Kimi Coding access is no longer forced through OAuth-only `kimi-coding` route. Includes registry, constants, models API, config, and validation test (#463, @Mind-Dragon)
+
+### 🐛 Bug Fixes
+
+- **fix(docker)**: Added missing `split2` dependency to Docker image — `pino-abstract-transport` requires it at runtime but it was not being copied into the standalone container, causing `Cannot find module 'split2'` crashes (#459)
+
+---
+
+## [2.7.9] — 2026-03-18
+
+> Sprint: Codex responses subpath passthrough natively supported, Windows MITM crash fixed, and Combos agent schemas adjusted.
+
+### ✨ Features
+
+- **feat(codex)**: Native responses subpath passthrough for Codex — natively routes `POST /v1/responses/compact` to Codex upstream, maintaining Claude Code compatibility without stripping the `/compact` suffix (#457)
+
+### 🐛 Bug Fixes
+
+- **fix(combos)**: Zod schemas (`updateComboSchema` and `createComboSchema`) now include `system_message`, `tool_filter_regex`, and `context_cache_protection`. Fixes bug where agent-specific settings created via the dashboard were silently discarded by the backend validation layer (#458)
+- **fix(mitm)**: Kiro MITM profile crash on Windows fixed — `node-machine-id` failed due to missing `REG.exe` env, and the fallback threw a fatal `crypto is not defined` error. Fallback now safely and correctly imports crypto (#456)
+
+---
+
+## [2.7.8] — 2026-03-18
+
+> Sprint: Budget save bug + combo agent features UI + omniModel tag security fix.
+
+### 🐛 Bug Fixes
+
+- **fix(budget)**: "Save Limits" no longer returns 422 — `warningThreshold` is now correctly sent as fraction (0–1) instead of percentage (0–100) (#451)
+- **fix(combos)**: `<omniModel>` internal cache tag is now stripped before forwarding requests to providers, preventing cache session breaks (#454)
+
+### ✨ Features
+
+- **feat(combos)**: Agent Features section added to combo create/edit modal — expose `system_message` override, `tool_filter_regex`, and `context_cache_protection` directly from the dashboard (#454)
+
+---
+
+## [2.7.7] — 2026-03-18
+
+> Sprint: Docker pino crash, Codex CLI responses worker fix, package-lock sync.
+
+### 🐛 Bug Fixes
+
+- **fix(docker)**: `pino-abstract-transport` and `pino-pretty` now explicitly copied in Docker runner stage — Next.js standalone trace misses these peer deps, causing `Cannot find module pino-abstract-transport` crash on startup (#449)
+- **fix(responses)**: Remove `initTranslators()` from `/v1/responses` route — was crashing Next.js worker with `the worker has exited` uncaughtException on Codex CLI requests (#450)
+
+### 🔧 Maintenance
+
+- **chore(deps)**: `package-lock.json` now committed on every version bump to ensure Docker `npm ci` uses exact dependency versions
+
+---
+
+## [2.7.5] — 2026-03-18
+
+> Sprint: UX improvements and Windows CLI healthcheck fix.
+
+### 🐛 Bug Fixes
+
+- **fix(ux)**: Show default password hint on login page — new users now see `"Default password: 123456"` below the password input (#437)
+- **fix(cli)**: Claude CLI and other npm-installed tools now correctly detected as runnable on Windows — spawn uses `shell:true` to resolve `.cmd` wrappers via PATHEXT (#447)
+
+---
+
+## [2.7.4] — 2026-03-18
+
+> Sprint: Search Tools dashboard, i18n fixes, Copilot limits, Serper validation fix.
+
+### 🚀 Features
+
+- **feat(search)**: Add Search Playground (10th endpoint), Search Tools page with Compare Providers/Rerank Pipeline/Search History, local rerank routing, auth guards on search API (#443 by @Regis-RCR)
+  - New route: `/dashboard/search-tools`
+  - Sidebar entry under Debug section
+  - `GET /api/search/providers` and `GET /api/search/stats` with auth guards
+  - Local provider_nodes routing for `/v1/rerank`
+  - 30+ i18n keys in search namespace
+
+### 🐛 Bug Fixes
+
+- **fix(search)**: Fix Brave news normalizer (was returning 0 results), enforce max_results truncation post-normalization, fix Endpoints page fetch URL (#443 by @Regis-RCR)
+- **fix(analytics)**: Localize analytics day/date labels — replace hardcoded Portuguese strings with `Intl.DateTimeFormat(locale)` (#444 by @hijak)
+- **fix(copilot)**: Correct GitHub Copilot account type display, filter misleading unlimited quota rows from limits dashboard (#445 by @hijak)
+- **fix(providers)**: Stop rejecting valid Serper API keys — treat non-4xx responses as valid authentication (#446 by @hijak)
+
+---
+
+## [2.7.3] — 2026-03-18
+
+> Sprint: Codex direct API quota fallback fix.
+
+### 🐛 Bug Fixes
+
+- **fix(codex)**: Block weekly-exhausted accounts in direct API fallback (#440)
+  - `resolveQuotaWindow()` prefix matching: `"weekly"` now matches `"weekly (7d)"` cache keys
+  - `applyCodexWindowPolicy()` enforces `useWeekly`/`use5h` toggles correctly
+  - 4 new regression tests (766 total)
+
+---
+
+## [2.7.2] — 2026-03-18
+
+> Sprint: Light mode UI contrast fixes.
+
+### 🐛 Bug Fixes
+
+- **fix(logs)**: Fix light mode contrast in request logs filter buttons and combo badge (#378)
+  - Error/Success/Combo filter buttons now readable in light mode
+  - Combo row badge uses stronger violet in light mode
+
+---
+
+## [2.7.1] — 2026-03-17
+
+> Sprint: Unified web search routing (POST /v1/search) with 5 providers + Next.js 16.1.7 security fixes (6 CVEs).
+
+### ✨ New Features
+
+- **feat(search)**: Unified web search routing — `POST /v1/search` with 5 providers (Serper, Brave, Perplexity, Exa, Tavily)
+  - Auto-failover across providers, 6,500+ free searches/month
+  - In-memory cache with request coalescing (configurable TTL)
+  - Dashboard: Search Analytics tab in `/dashboard/analytics` with provider breakdown, cache hit rate, cost tracking
+  - New API: `GET /api/v1/search/analytics` for search request statistics
+  - DB migration: `request_type` column on `call_logs` for non-chat request tracking
+  - Zod validation (`v1SearchSchema`), auth-gated, cost recorded via `recordCost()`
+
+### 🔒 Security
+
+- **deps**: Next.js 16.1.6 → 16.1.7 — fixes 6 CVEs:
+  - **Critical**: CVE-2026-29057 (HTTP request smuggling via http-proxy)
+  - **High**: CVE-2026-27977, CVE-2026-27978 (WebSocket + Server Actions)
+  - **Medium**: CVE-2026-27979, CVE-2026-27980, CVE-2026-jcc7
+
+### 📁 New Files
+
+| File                                                             | Purpose                                    |
+| ---------------------------------------------------------------- | ------------------------------------------ |
+| `open-sse/handlers/search.ts`                                    | Search handler with 5-provider routing     |
+| `open-sse/config/searchRegistry.ts`                              | Provider registry (auth, cost, quota, TTL) |
+| `open-sse/services/searchCache.ts`                               | In-memory cache with request coalescing    |
+| `src/app/api/v1/search/route.ts`                                 | Next.js route (POST + GET)                 |
+| `src/app/api/v1/search/analytics/route.ts`                       | Search stats API                           |
+| `src/app/(dashboard)/dashboard/analytics/SearchAnalyticsTab.tsx` | Analytics dashboard tab                    |
+| `src/lib/db/migrations/007_search_request_type.sql`              | DB migration                               |
+| `tests/unit/search-registry.test.mjs`                            | 277 lines of unit tests                    |
+
+---
+
+## [2.7.0] — 2026-03-17
+
+> Sprint: ClawRouter-inspired features — toolCalling flag, multilingual intent detection, benchmark-driven fallback, request deduplication, pluggable RouterStrategy, Grok-4 Fast + GLM-5 + MiniMax M2.5 + Kimi K2.5 pricing.
+
+### ✨ New Models & Pricing
+
+- **feat(pricing)**: xAI Grok-4 Fast — `$0.20/$0.50 per 1M tokens`, 1143ms p50 latency, tool calling supported
+- **feat(pricing)**: xAI Grok-4 (standard) — `$0.20/$1.50 per 1M tokens`, reasoning flagship
+- **feat(pricing)**: GLM-5 via Z.AI — `$0.5/1M`, 128K output context
+- **feat(pricing)**: MiniMax M2.5 — `$0.30/1M input`, reasoning + agentic tasks
+- **feat(pricing)**: DeepSeek V3.2 — updated pricing `$0.27/$1.10 per 1M`
+- **feat(pricing)**: Kimi K2.5 via Moonshot API — direct Moonshot API access
+- **feat(providers)**: Z.AI provider added (`zai` alias) — GLM-5 family with 128K output
+
+### 🧠 Routing Intelligence
+
+- **feat(registry)**: `toolCalling` flag per model in provider registry — combos can now prefer/require tool-calling capable models
+- **feat(scoring)**: Multilingual intent detection for AutoCombo scoring — PT/ZH/ES/AR script/language patterns influence model selection per request context
+- **feat(fallback)**: Benchmark-driven fallback chains — real latency data (p50 from `comboMetrics`) used to re-order fallback priority dynamically
+- **feat(dedup)**: Request deduplication via content-hash — 5-second idempotency window prevents duplicate provider calls from retrying clients
+- **feat(router)**: Pluggable `RouterStrategy` interface in `autoCombo/routerStrategy.ts` — custom routing logic can be injected without modifying core
+
+### 🔧 MCP Server Improvements
+
+- **feat(mcp)**: 2 new advanced tool schemas: `omniroute_get_provider_metrics` (p50/p95/p99 per provider) and `omniroute_explain_route` (routing decision explanation)
+- **feat(mcp)**: MCP tool auth scopes updated — `metrics:read` scope added for provider metrics tools
+- **feat(mcp)**: `omniroute_best_combo_for_task` now accepts `languageHint` parameter for multilingual routing
+
+### 📊 Observability
+
+- **feat(metrics)**: `comboMetrics.ts` extended with real-time latency percentile tracking per provider/account
+- **feat(health)**: Health API (`/api/monitoring/health`) now returns per-provider `p50Latency` and `errorRate` fields
+- **feat(usage)**: Usage history migration for per-model latency tracking
+
+### 🗄️ DB Migrations
+
+- **feat(migrations)**: New column `latency_p50` in `combo_metrics` table — zero-breaking, safe for existing users
+
+### 🐛 Bug Fixes / Closures
+
+- **close(#411)**: better-sqlite3 hashed module resolution on Windows — fixed in v2.6.10 (f02c5b5)
+- **close(#409)**: GitHub Copilot chat completions fail with Claude models when files attached — fixed in v2.6.9 (838f1d6)
+- **close(#405)**: Duplicate of #411 — resolved
+
+## [2.6.10] — 2026-03-17
+
+> Windows fix: better-sqlite3 prebuilt download without node-gyp/Python/MSVC (#426).
+
+### 🐛 Bug Fixes
+
+- **fix(install/#426)**: On Windows, `npm install -g omniroute` used to fail with `better_sqlite3.node is not a valid Win32 application` because the bundled native binary was compiled for Linux. Adds **Strategy 1.5** to `scripts/postinstall.mjs`: uses `@mapbox/node-pre-gyp install --fallback-to-build=false` (bundled within `better-sqlite3`) to download the correct prebuilt binary for the current OS/arch without requiring any build tools (no node-gyp, no Python, no MSVC). Falls back to `npm rebuild` only if the download fails. Adds platform-specific error messages with clear manual fix instructions.
+
+---
+
+## [2.6.9] — 2026-03-17
+
+> CI fixes (t11 any-budget), bug fix #409 (file attachments via Copilot+Claude), release workflow correction.
+
+### 🐛 Bug Fixes
+
+- **fix(ci)**: Remove word "any" from comments in `openai-responses.ts` and `chatCore.ts` that were failing the t11 `\bany\b` budget check (false positive from regex counting comments)
+- **fix(chatCore)**: Normalize unsupported content part types before forwarding to providers (#409 — Cursor sends `{type:"file"}` when `.md` files are attached; Copilot and other OpenAI-compat providers reject with "type has to be either 'image_url' or 'text'"; fix converts `file`/`document` blocks to `text` and drops unknown types)
+
+### 🔧 Workflow
+
+- **chore(generate-release)**: Add ATOMIC COMMIT RULE — version bump (`npm version patch`) MUST happen before committing feature files to ensure tag always points to a commit containing all version changes together
+
+---
+
+## [2.6.8] — 2026-03-17
+
+> Sprint: Combo as Agent (system prompt + tool filter), Context Caching Protection, Auto-Update, Detailed Logs, MITM Kiro IDE.
+
+### 🗄️ DB Migrations (zero-breaking — safe for existing users)
+
+- **005_combo_agent_fields.sql**: `ALTER TABLE combos ADD COLUMN system_message TEXT DEFAULT NULL`, `tool_filter_regex TEXT DEFAULT NULL`, `context_cache_protection INTEGER DEFAULT 0`
+- **006_detailed_request_logs.sql**: New `request_detail_logs` table with 500-entry ring-buffer trigger, opt-in via settings toggle
+
+### ✨ Features
+
+- **feat(combo)**: System Message Override per Combo (#399 — `system_message` field replaces or injects system prompt before forwarding to provider)
+- **feat(combo)**: Tool Filter Regex per Combo (#399 — `tool_filter_regex` keeps only tools matching pattern; supports OpenAI + Anthropic formats)
+- **feat(combo)**: Context Caching Protection (#401 — `context_cache_protection` tags responses with `<omniModel>provider/model</omniModel>` and pins model for session continuity)
+- **feat(settings)**: Auto-Update via Settings (#320 — `GET /api/system/version` + `POST /api/system/update` — checks npm registry and updates in background with pm2 restart)
+- **feat(logs)**: Detailed Request Logs (#378 — captures full pipeline bodies at 4 stages: client request, translated request, provider response, client response — opt-in toggle, 64KB trim, 500-entry ring-buffer)
+- **feat(mitm)**: MITM Kiro IDE profile (#336 — `src/mitm/targets/kiro.ts` targets api.anthropic.com, reuses existing MITM infrastructure)
+
+---
+
+## [2.6.7] — 2026-03-17
+
+> Sprint: SSE improvements, local provider_nodes extensions, proxy registry, Claude passthrough fixes.
+
+### ✨ Features
+
+- **feat(health)**: Background health check for local `provider_nodes` with exponential backoff (30s→300s) and `Promise.allSettled` to avoid blocking (#423, @Regis-RCR)
+- **feat(embeddings)**: Route `/v1/embeddings` to local `provider_nodes` — `buildDynamicEmbeddingProvider()` with hostname validation (#422, @Regis-RCR)
+- **feat(audio)**: Route TTS/STT to local `provider_nodes` — `buildDynamicAudioProvider()` with SSRF protection (#416, @Regis-RCR)
+- **feat(proxy)**: Proxy registry, management APIs, and quota-limit generalization (#429, @Regis-RCR)
+
+### 🐛 Bug Fixes
+
+- **fix(sse)**: Strip Claude-specific fields (`metadata`, `anthropic_version`) when target is OpenAI-compat (#421, @prakersh)
+- **fix(sse)**: Extract Claude SSE usage (`input_tokens`, `output_tokens`, cache tokens) in passthrough stream mode (#420, @prakersh)
+- **fix(sse)**: Generate fallback `call_id` for tool calls with missing/empty IDs (#419, @prakersh)
+- **fix(sse)**: Claude-to-Claude passthrough — forward body completely untouched, no re-translation (#418, @prakersh)
+- **fix(sse)**: Filter orphaned `tool_result` items after Claude Code context compaction to avoid 400 errors (#417, @prakersh)
+- **fix(sse)**: Skip empty-name tool calls in Responses API translator to prevent `placeholder_tool` infinite loops (#415, @prakersh)
+- **fix(sse)**: Strip empty text content blocks before translation (#427, @prakersh)
+- **fix(api)**: Add `refreshable: true` to Claude OAuth test config (#428, @prakersh)
+
+### 📦 Dependencies
+
+- Bump `vitest`, `@vitest/*` and related devDependencies (#414, @dependabot)
+
+---
+
 ## [2.6.6] — 2026-03-17

 > Hotfix: Turbopack/Docker compatibility — remove `node:` protocol from all `src/` imports.
@@ -32,6 +32,11 @@ COPY --from=builder /app/.next/static ./.next/static
 COPY --from=builder /app/.next/standalone ./
 # Explicitly copy @swc/helpers — not always traced by standalone output but needed at runtime
 COPY --from=builder /app/node_modules/@swc/helpers ./node_modules/@swc/helpers
+# Explicitly copy pino transport dependencies — pino spawns a worker that requires
+# pino-abstract-transport at runtime; Next.js standalone trace does not capture it (#449)
+COPY --from=builder /app/node_modules/pino-abstract-transport ./node_modules/pino-abstract-transport
+COPY --from=builder /app/node_modules/pino-pretty ./node_modules/pino-pretty
+COPY --from=builder /app/node_modules/split2 ./node_modules/split2
 COPY --from=builder /app/scripts/run-standalone.mjs ./run-standalone.mjs
 COPY --from=builder /app/scripts/runtime-env.mjs ./runtime-env.mjs
 COPY --from=builder /app/scripts/bootstrap-env.mjs ./bootstrap-env.mjs
@@ -4,7 +4,7 @@

 _Your universal API proxy — one endpoint, 44+ providers, zero downtime. Now with **MCP & A2A** agent orchestration._

-**Chat Completions • Embeddings • Image Generation • Video • Music • Audio • Reranking • MCP Server • A2A Protocol • 100% TypeScript**
+**Chat Completions • Embeddings • Image Generation • Video • Music • Audio • Reranking • **Web Search** • MCP Server • A2A Protocol • 100% TypeScript**

 ---

@@ -898,27 +898,44 @@ When minimized, OmniRoute lives in your system tray with quick actions:

 ## 💰 Pricing at a Glance

-| Tier                | Provider          | Cost                   | Quota Reset      | Best For                |
-| ------------------- | ----------------- | ---------------------- | ---------------- | ----------------------- |
-| **💳 SUBSCRIPTION** | Claude Code (Pro) | $20/mo                 | 5h + weekly      | Already subscribed      |
-|                     | Codex (Plus/Pro)  | $20-200/mo             | 5h + weekly      | OpenAI users            |
-|                     | Gemini CLI        | **FREE**               | 180K/mo + 1K/day | Everyone!               |
-|                     | GitHub Copilot    | $10-19/mo              | Monthly          | GitHub users            |
-| **🔑 API KEY**      | NVIDIA NIM        | **FREE** (dev forever) | ~40 RPM          | 70+ open models         |
-|                     | Cerebras          | **FREE** (1M tok/day)  | 60K TPM / 30 RPM | World's fastest         |
-|                     | Groq              | **FREE** (30 RPM)      | 14.4K RPD        | Ultra-fast Llama/Gemma  |
-|                     | DeepSeek          | Pay-per-use            | None             | Best price/quality      |
-|                     | xAI (Grok)        | Pay-per-use            | None             | Grok models             |
-|                     | Mistral           | Free trial + paid      | Rate limited     | European AI             |
-|                     | OpenRouter        | Pay-per-use            | None             | 100+ models aggr.       |
-| **💰 CHEAP**        | GLM-4.7           | $0.6/1M                | Daily 10AM       | Budget backup           |
-|                     | MiniMax M2.1      | $0.2/1M                | 5-hour rolling   | Cheapest option         |
-|                     | Kimi K2           | $9/mo flat             | 10M tokens/mo    | Predictable cost        |
-| **🆓 FREE**         | iFlow             | **$0**                 | Unlimited        | 5 models unlimited      |
-|                     | Qwen              | **$0**                 | Unlimited        | 4 models unlimited      |
-|                     | Kiro              | **$0**                 | Unlimited        | Claude (AWS Builder ID) |
+| Tier                | Provider                    | Cost                      | Quota Reset      | Best For                          |
+| ------------------- | --------------------------- | ------------------------- | ---------------- | --------------------------------- |
+| **💳 SUBSCRIPTION** | Claude Code (Pro)           | $20/mo                    | 5h + weekly      | Already subscribed                |
+|                     | Codex (Plus/Pro)            | $20-200/mo                | 5h + weekly      | OpenAI users                      |
+|                     | Gemini CLI                  | **FREE**                  | 180K/mo + 1K/day | Everyone!                         |
+|                     | GitHub Copilot              | $10-19/mo                 | Monthly          | GitHub users                      |
+| **🔑 API KEY**      | NVIDIA NIM                  | **FREE** (dev forever)    | ~40 RPM          | 70+ open models                   |
+|                     | Cerebras                    | **FREE** (1M tok/day)     | 60K TPM / 30 RPM | World's fastest                   |
+|                     | Groq                        | **FREE** (30 RPM)         | 14.4K RPD        | Ultra-fast Llama/Gemma            |
+|                     | DeepSeek V3.2               | $0.27/$1.10 per 1M        | None             | Best price/quality reasoning      |
+|                     | xAI Grok-4 Fast             | **$0.20/$0.50 per 1M** 🆕 | None             | Fastest + tool calling, ultralow  |
+|                     | xAI Grok-4 (standard)       | $0.20/$1.50 per 1M 🆕     | None             | Reasoning flagship from xAI       |
+|                     | Mistral                     | Free trial + paid         | Rate limited     | European AI                       |
+|                     | OpenRouter                  | Pay-per-use               | None             | 100+ models aggr.                 |
+| **💰 CHEAP**        | GLM-5 (via Z.AI) 🆕         | $0.5/1M                   | Daily 10AM       | 128K output, newest flagship      |
+|                     | GLM-4.7                     | $0.6/1M                   | Daily 10AM       | Budget backup                     |
+|                     | MiniMax M2.5 🆕             | $0.3/1M input             | 5-hour rolling   | Reasoning + agentic tasks         |
+|                     | MiniMax M2.1                | $0.2/1M                   | 5-hour rolling   | Cheapest option                   |
+|                     | Kimi K2.5 (Moonshot API) 🆕 | Pay-per-use               | None             | Direct Moonshot API access        |
+|                     | Kimi K2                     | $9/mo flat                | 10M tokens/mo    | Predictable cost                  |
+| **🆓 FREE**         | iFlow                       | **$0**                    | Unlimited        | 5 models unlimited                |
+|                     | Qwen                        | **$0**                    | Unlimited        | 4 models unlimited                |
+|                     | Kiro                        | **$0**                    | Unlimited        | Claude Sonnet/Haiku (AWS Builder) |

-**💡 $0 Combo Stack:** Gemini CLI (180K/mo) → iFlow (unlimited: kimi-k2-thinking, qwen3-coder-plus, deepseek-r1) → Kiro (Claude for free) → Qwen (4 models, unlimited) — **Zero cost, never stops coding.** When Gemini quota runs out, OmniRoute auto-falls back to iFlow or Kiro with zero config.
+> 🆕 **New models added (Mar 2026):** Grok-4 Fast family at $0.20/$0.50/M (benchmarked at 1143ms — 30% faster than Gemini 2.5 Flash), GLM-5 via Z.AI with 128K output, MiniMax M2.5 reasoning, DeepSeek V3.2 updated pricing, Kimi K2.5 via Moonshot direct API.
+
+**💡 $0 Combo Stack — The Complete Free Setup:**
+
+```
+Gemini CLI (180K/mo free)
+  → iFlow (unlimited: kimi-k2-thinking, qwen3-coder-plus, deepseek-r1)
+  → Kiro (Claude Sonnet 4.5 + Haiku — unlimited, via AWS Builder ID)
+  → Qwen (4 models — unlimited)
+  → Groq (14.4K req/day — ultra-fast)
+  → NVIDIA NIM (70+ models — 40 RPM forever)
+```
+
+**Zero cost. Never stops coding.** Configure this as one OmniRoute combo and all fallbacks happen automatically — no manual switching ever.

 ---

@@ -1027,7 +1044,20 @@ Then in `/dashboard/media` → **Transcription** tab: upload any audio or video

 OmniRoute v2.0 is built as an operational platform, not just a relay proxy.

-### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
+### 🆕 New — ClawRouter-Inspired Improvements (Mar 2026)
+
+| Feature                              | What It Does                                                                                |
+| ------------------------------------ | ------------------------------------------------------------------------------------------- |
+| ⚡ **Grok-4 Fast Family**            | xAI models at $0.20/$0.50/M — benchmarked 1143ms (30% faster than Gemini 2.5 Flash)         |
+| 🧠 **GLM-5 via Z.AI**                | 128K output context, $0.5/1M — newest flagship from the GLM family                          |
+| 🔮 **MiniMax M2.5**                  | Reasoning + agentic tasks at $0.30/1M — significant upgrade from M2.1                       |
+| 🎯 **toolCalling Flag per Model**    | Per-model `toolCalling: true/false` in registry — AutoCombo skips non-tool-capable models   |
+| 🌍 **Multilingual Intent Detection** | PT/ZH/ES/AR keywords in AutoCombo scoring — better model selection for non-English content  |
+| 📊 **Benchmark-Driven Fallbacks**    | Real p95 latency from live requests feeds combo scoring — AutoCombo learns from actual data |
+| 🔁 **Request Deduplication**         | Content-hash based dedup window — multi-agent safe, prevents duplicate charges              |
+| 🔌 **Pluggable RouterStrategy**      | Extensible `RouterStrategy` interface — add custom routing logic as plugins                 |
+
+### 🚀 Previous v2.0.9+ — Playground, CLI Fingerprints & ACP

 | Feature                                    | What It Does                                                                                                                                                                                                                            |
 | ------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
@@ -1075,16 +1105,17 @@ OmniRoute v2.0 is built as an operational platform, not just a relay proxy.

 ### 🎵 Multi-Modal APIs

-| Feature                    | What It Does                                                  |
-| -------------------------- | ------------------------------------------------------------- |
-| 🖼️ **Image Generation**    | `/v1/images/generations` with cloud and local backends        |
-| 📐 **Embeddings**          | `/v1/embeddings` for search and RAG pipelines                 |
-| 🎤 **Audio Transcription** | `/v1/audio/transcriptions` (Whisper and additional providers) |
-| 🔊 **Text-to-Speech**      | `/v1/audio/speech` (multiple engines/providers)               |
-| 🎬 **Video Generation**    | `/v1/videos/generations` (ComfyUI + SD WebUI workflows)       |
-| 🎵 **Music Generation**    | `/v1/music/generations` (ComfyUI workflows)                   |
-| 🛡️ **Moderations**         | `/v1/moderations` safety checks                               |
-| 🔀 **Reranking**           | `/v1/rerank` for relevance scoring                            |
+| Feature                    | What It Does                                                                                                 |
+| -------------------------- | ------------------------------------------------------------------------------------------------------------ |
+| 🖼️ **Image Generation**    | `/v1/images/generations` with cloud and local backends                                                       |
+| 📐 **Embeddings**          | `/v1/embeddings` for search and RAG pipelines                                                                |
+| 🎤 **Audio Transcription** | `/v1/audio/transcriptions` (Whisper and additional providers)                                                |
+| 🔊 **Text-to-Speech**      | `/v1/audio/speech` (multiple engines/providers)                                                              |
+| 🎬 **Video Generation**    | `/v1/videos/generations` (ComfyUI + SD WebUI workflows)                                                      |
+| 🎵 **Music Generation**    | `/v1/music/generations` (ComfyUI workflows)                                                                  |
+| 🛡️ **Moderations**         | `/v1/moderations` safety checks                                                                              |
+| 🔀 **Reranking**           | `/v1/rerank` for relevance scoring                                                                           |
+| 🔍 **Web Search** 🆕       | `/v1/search` — 5 providers (Serper, Brave, Perplexity, Exa, Tavily), 6,500+ free/month, auto-failover, cache |

 ### 🛡️ Resilience, Security & Governance

@@ -0,0 +1,46 @@
+# ADR-0001: Proxy Registry + Usage Control Generalization
+
+Date: 2026-03-17
+Status: Accepted
+
+## Context
+
+OmniRoute sudah punya:
+
+- Proxy assignment berbasis config-map (`global`, `providers`, `combos`, `keys`).
+- Quota-aware selection khusus provider tertentu (notably `codex`).
+
+Gap utama:
+
+- Proxy belum menjadi aset reusable yang bisa di-manage sebagai entitas (metadata, where-used, safe delete).
+- Usage policy belum konsisten lintas provider.
+- Error contract API belum seragam untuk endpoint manajemen.
+
+## Decision
+
+1. Tambah **Proxy Registry** sebagai domain baru di DB (`proxy_registry`, `proxy_assignments`).
+2. Pertahankan kompatibilitas assignment lama (fallback ke `proxyConfig` lama).
+3. Resolver runtime pakai prioritas:
+   - account -> provider -> global (registry)
+   - fallback ke legacy resolver jika registry belum ada assignment
+4. Wajib redaction kredensial di output list registry default.
+5. Standarkan error JSON untuk endpoint manajemen proxy agar konsisten dan punya `requestId`.
+
+## Consequences
+
+Positif:
+
+- Proxy reusable dan bisa dilacak pemakaiannya.
+- Safe delete bisa ditegakkan (409 saat masih dipakai).
+- Migrasi bertahap tanpa breaking change runtime.
+
+Negatif:
+
+- Ada dual-source sementara (registry + legacy config) sampai migrasi selesai.
+- Butuh endpoint assignment tambahan dan pemetaan scope yang konsisten.
+
+## Follow-up
+
+- Migrasi UI provider/account dari input raw proxy ke selector registry.
+- Tambah health telemetry per proxy dan alerting.
+- Generalisasi usage control ke provider lain melalui interface policy yang sama.
@@ -0,0 +1,32 @@
+# ADR-0002: Error Contract for Management Endpoints
+
+Date: 2026-03-17
+Status: Accepted
+
+## Decision
+
+Management endpoints (proxy config, proxy registry, and proxy assignments) return a uniform error body:
+
+```json
+{
+  "error": {
+    "message": "Human-readable summary",
+    "type": "invalid_request | not_found | conflict | server_error",
+    "details": {}
+  },
+  "requestId": "uuid"
+}
+```
+
+## Status Mapping
+
+- 400: invalid request / validation failure
+- 404: resource not found
+- 409: resource conflict (for example, proxy still assigned)
+- 500: unexpected server error
+
+## Notes
+
+- `requestId` is mandatory for log correlation.
+- `details` is optional and only used for safe validation details.
+- Sensitive secrets (proxy credentials, tokens) must never appear in `message` or `details`.
@@ -0,0 +1,16 @@
+# ADR-0003: Security Checklist for Proxy Registry and Usage Controls
+
+Date: 2026-03-17
+Status: Accepted
+
+## Checklist
+
+- Validate all management payloads with Zod.
+- Reject malformed scope assignment updates with status 400.
+- Reject deleting an in-use proxy with status 409 unless forced.
+- Never expose proxy username/password in list responses by default.
+- Never log raw credentials or token values.
+- Keep error responses free from internal stack traces.
+- Protect management endpoints with existing auth middleware policy.
+- Audit mutating operations: create/update/delete/assign/migrate.
+- Ensure resolver fallback to legacy config while migration is in transition.
@@ -8,6 +8,16 @@ _وكيل API العالمي الخاص بك - نقطة نهاية واحدة،

 ---

+### 🆕 الجديد في v2.7.0
+
+- **RouterStrategy قابل للتوصيل** — استراتيجيات القواعد والتكلفة والكمون
+- **كشف النية متعدد اللغات** — تسجيل التوجيه بأكثر من 30 لغة
+- **إلغاء تكرار الطلبات** — تجنب مكالمات API المكررة عبر تجزئة المحتوى
+- **مزودون جدد:** Grok-4 Fast (xAI) وGLM-5 / Z.AI وMiniMax M2.5 وKimi K2.5
+- **أسعار محدثة:** Grok-4 Fast $0.20/$0.50/M، GLM-5 $0.50/M، MiniMax M2.5 $0.30/M
+
+---
+
 <div align="center">

 [![إصدار npm](https://img.shields.io/npm/v/omniroute?color=cb3837&logo=npm)](https://www.npmjs.com/package/omniroute)
@@ -8,6 +8,16 @@ _Вашият универсален API прокси — една крайна

 ---

+### 🆕 What's New in v2.7.0
+
+- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
+- **Multilingual intent detection** — routing scoring in 30+ languages
+- **Request deduplication** — prevent duplicate API calls via content hash
+- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
+- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
+
+---
+
 <div align="center">

 [![npm версия](https://img.shields.io/npm/v/omniroute?color=cb3837&logo=npm)](https://www.npmjs.com/package/omniroute)
@@ -8,6 +8,16 @@ _Din universelle API-proxy — ét slutpunkt, 36+ udbydere, ingen nedetid. Nu me

 ---

+### 🆕 What's New in v2.7.0
+
+- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
+- **Multilingual intent detection** — routing scoring in 30+ languages
+- **Request deduplication** — prevent duplicate API calls via content hash
+- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
+- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
+
+---
+
 <div align="center">

 [![npm version](https://img.shields.io/npm/v/omniroute?color=cb3837&logo=npm)](https://www.npmjs.com/package/omniroute)
@@ -8,6 +8,16 @@ _Ihr universeller API-Proxy – ein Endpunkt, mehr als 36 Anbieter, keine Ausfal

 ---

+### 🆕 Neu in v2.7.0
+
+- **Erweiterbare RouterStrategy** — Regeln-, Kosten- und Latenzstrategien
+- **Mehrsprachige Absichtserkennung** — Routing-Scoring in 30+ Sprachen
+- **Anfrage-Deduplizierung** — doppelte API-Aufrufe per Content-Hash vermeiden
+- **Neue Anbieter:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
+- **Aktualisierte Preise:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
+
+---
+
 <div align="center">

 [![npm-Version](https://img.shields.io/npm/v/omniroute?color=cb3837&logo=npm)](https://www.npmjs.com/package/omniroute)
@@ -11,6 +11,16 @@ _Tu proxy de API universal — un endpoint, 36+ proveedores, cero tiempo de inac

 ---

+### 🆕 Novedades en v2.7.0
+
+- **RouterStrategy enchufable** — estrategias de reglas, costo y latencia
+- **Detección de intención multilingüe** — puntuación de enrutamiento en 30+ idiomas
+- **Deduplicación de solicitudes** — evita llamadas duplicadas por hash de contenido
+- **Nuevos proveedores:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
+- **Precios actualizados:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
+
+---
+
 ### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP

 | Feature                                    | What It Does                                                                                                                                  |
@@ -11,6 +11,16 @@ _Universaali API-välityspalvelin – yksi päätepiste, yli 36 palveluntarjoaja

 ---

+### 🆕 What's New in v2.7.0
+
+- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
+- **Multilingual intent detection** — routing scoring in 30+ languages
+- **Request deduplication** — prevent duplicate API calls via content hash
+- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
+- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
+
+---
+
 ### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP

 | Feature                                    | What It Does                                                                                                                                  |
@@ -11,6 +11,16 @@ _Votre proxy API universel — un endpoint, 36+ fournisseurs, zéro temps d'arr

 ---

+### 🆕 Nouveautés dans v2.7.0
+
+- **RouterStrategy extensible** — stratégies de règles, coût et latence
+- **Détection d'intention multilingue** — scoring de routage en 30+ langues
+- **Déduplication des requêtes** — évite les appels dupliqués via hash de contenu
+- **Nouveaux fournisseurs :** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
+- **Tarifs mis à jour :** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
+
+---
+
 ### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP

 | Feature                                    | What It Does                                                                                                                                  |
@@ -11,6 +11,16 @@ _שרת ה-API האוניברסלי שלך - נקודת קצה אחת, 36+ ספ

 ---

+### 🆕 What's New in v2.7.0
+
+- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
+- **Multilingual intent detection** — routing scoring in 30+ languages
+- **Request deduplication** — prevent duplicate API calls via content hash
+- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
+- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
+
+---
+
 ### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP

 | Feature                                    | What It Does                                                                                                                                  |
@@ -11,6 +11,16 @@ _Az univerzális API-proxy – egy végpont, 36+ szolgáltató, nulla állásid

 ---

+### 🆕 What's New in v2.7.0
+
+- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
+- **Multilingual intent detection** — routing scoring in 30+ languages
+- **Request deduplication** — prevent duplicate API calls via content hash
+- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
+- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
+
+---
+
 ### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP

 | Feature                                    | What It Does                                                                                                                                  |
@@ -11,6 +11,16 @@ _Proksi API universal Anda — satu titik akhir, 36+ penyedia, tanpa waktu henti

 ---

+### 🆕 What's New in v2.7.0
+
+- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
+- **Multilingual intent detection** — routing scoring in 30+ languages
+- **Request deduplication** — prevent duplicate API calls via content hash
+- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
+- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
+
+---
+
 ### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP

 | Feature                                    | What It Does                                                                                                                                  |
@@ -13,6 +13,16 @@ _आपका सार्वभौमिक एपीआई प्रॉक्

 ---

+### 🆕 What's New in v2.7.0
+
+- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
+- **Multilingual intent detection** — routing scoring in 30+ languages
+- **Request deduplication** — prevent duplicate API calls via content hash
+- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
+- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
+
+---
+
 ### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP

 | Feature                                    | What It Does                                                                                                                                  |
@@ -11,6 +11,16 @@ _Il tuo proxy API universale — un endpoint, 36+ provider, zero downtime._

 ---

+### 🆕 Novità in v2.7.0
+
+- **RouterStrategy estensibile** — strategie per regole, costo e latenza
+- **Rilevamento intento multilingue** — scoring di routing in 30+ lingue
+- **Deduplicazione richieste** — evita chiamate duplicate tramite hash del contenuto
+- **Nuovi provider:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
+- **Prezzi aggiornati:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
+
+---
+
 ### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP

 | Feature                                    | What It Does                                                                                                                                  |
@@ -11,6 +11,16 @@ _ユニバーサル API プロキシ — 1 つのエンドポイント、36 以

 ---

+### 🆕 v2.7.0 の新機能
+
+- **プラガブル RouterStrategy** — ルール・コスト・レイテンシ戦略をサポート
+- **多言語インテント検出** — 30以上の言語でルーティングスコアリング
+- **リクエスト重複排除** — コンテンツハッシュで重複 API 呼び出しを防止
+- **新しいプロバイダー：** Grok-4 Fast (xAI)、GLM-5 / Z.AI、MiniMax M2.5、Kimi K2.5
+- **価格更新：** Grok-4 Fast $0.20/$0.50/M、GLM-5 $0.50/M、MiniMax M2.5 $0.30/M
+
+---
+
 ### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP

 | Feature                                    | What It Does                                                                                                                                  |
@@ -11,6 +11,16 @@ _범용 API 프록시 — 하나의 엔드포인트, 36개 이상의 공급자,

 ---

+### 🆕 v2.7.0 새로운 기능
+
+- **플러그형 RouterStrategy** — 규칙, 비용, 지연 전략 지원
+- **다국어 의도 감지** — 30개 이상 언어로 라우팅 스코어링
+- **요청 중복 제거** — 콘텐츠 해시로 중복 API 호출 방지
+- **새 공급자:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
+- **가격 업데이트:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
+
+---
+
 ### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP

 | Feature                                    | What It Does                                                                                                                                  |
@@ -11,6 +11,16 @@ _Proksi API universal anda — satu titik akhir, 36+ pembekal, masa henti sifar.

 ---

+### 🆕 What's New in v2.7.0
+
+- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
+- **Multilingual intent detection** — routing scoring in 30+ languages
+- **Request deduplication** — prevent duplicate API calls via content hash
+- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
+- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
+
+---
+
 ### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP

 | Feature                                    | What It Does                                                                                                                                  |
@@ -11,6 +11,16 @@ _Uw universele API-proxy: één eindpunt, meer dan 36 providers, geen downtime._

 ---

+### 🆕 What's New in v2.7.0
+
+- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
+- **Multilingual intent detection** — routing scoring in 30+ languages
+- **Request deduplication** — prevent duplicate API calls via content hash
+- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
+- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
+
+---
+
 ### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP

 | Feature                                    | What It Does                                                                                                                                  |
@@ -11,6 +11,16 @@ _Din universelle API-proxy – ett endepunkt, 36+ leverandører, null nedetid._

 ---

+### 🆕 What's New in v2.7.0
+
+- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
+- **Multilingual intent detection** — routing scoring in 30+ languages
+- **Request deduplication** — prevent duplicate API calls via content hash
+- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
+- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
+
+---
+
 ### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP

 | Feature                                    | What It Does                                                                                                                                  |
@@ -11,6 +11,16 @@ _Iyong unibersal na API proxy — isang endpoint, 36+ provider, zero downtime._

 ---

+### 🆕 What's New in v2.7.0
+
+- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
+- **Multilingual intent detection** — routing scoring in 30+ languages
+- **Request deduplication** — prevent duplicate API calls via content hash
+- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
+- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
+
+---
+
 ### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP

 | Feature                                    | What It Does                                                                                                                                  |
@@ -11,6 +11,16 @@ _Twój uniwersalny serwer proxy API — jeden punkt końcowy, ponad 36 dostawcó

 ---

+### 🆕 What's New in v2.7.0
+
+- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
+- **Multilingual intent detection** — routing scoring in 30+ languages
+- **Request deduplication** — prevent duplicate API calls via content hash
+- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
+- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
+
+---
+
 ### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP

 | Feature                                    | What It Does                                                                                                                                  |
@@ -11,6 +11,16 @@ _Seu proxy de API universal — um endpoint, 36+ provedores, zero tempo de inati

 ---

+### 🆕 Novidades na v2.7.0
+
+- **RouterStrategy plugável** — estratégias de regras, custo e latência
+- **Detecção de intenção multilíngue** — scoring de roteamento em 30+ idiomas
+- **Deduplicação de requisições** — evita chamadas duplicadas por hash de conteúdo
+- **Novos provedores:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
+- **Preços atualizados:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
+
+---
+
 ### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP

 | Feature                                    | What It Does                                                                                                                                  |
@@ -11,6 +11,16 @@ _Seu proxy de API universal — um endpoint, mais de 36 provedores, tempo de ina

 ---

+### 🆕 Novidades na v2.7.0
+
+- **RouterStrategy extensível** — estratégias de regras, custo e latência
+- **Deteção de intenção multilíngue** — scoring de encaminhamento em 30+ idiomas
+- **Deduplicação de pedidos** — evita chamadas duplicadas por hash de conteúdo
+- **Novos fornecedores:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
+- **Preços atualizados:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
+
+---
+
 ### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP

 | Feature                                    | What It Does                                                                                                                                  |
@@ -11,6 +11,16 @@ _Proxy-ul dvs. universal API - un punct final, peste 36 de furnizori, zero timpi

 ---

+### 🆕 What's New in v2.7.0
+
+- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
+- **Multilingual intent detection** — routing scoring in 30+ languages
+- **Request deduplication** — prevent duplicate API calls via content hash
+- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
+- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
+
+---
+
 ### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP

 | Feature                                    | What It Does                                                                                                                                  |
@@ -11,6 +11,16 @@ _Ваш универсальный API-прокси — одна точка до

 ---

+### 🆕 Новое в v2.7.0
+
+- **Подключаемая RouterStrategy** — стратегии по правилам, стоимости и задержке
+- **Многоязычное распознавание намерений** — маршрутизация на 30+ языках
+- **Дедупликация запросов** — устранение дублей по хэшу содержимого
+- **Новые провайдеры:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
+- **Обновлённые цены:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
+
+---
+
 ### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP

 | Feature                                    | What It Does                                                                                                                                  |
@@ -11,6 +11,16 @@ _Váš univerzálny proxy server API – jeden koncový bod, 36+ poskytovateľov

 ---

+### 🆕 What's New in v2.7.0
+
+- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
+- **Multilingual intent detection** — routing scoring in 30+ languages
+- **Request deduplication** — prevent duplicate API calls via content hash
+- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
+- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
+
+---
+
 ### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP

 | Feature                                    | What It Does                                                                                                                                  |
@@ -11,6 +11,16 @@ _Din universella API-proxy — en slutpunkt, 36+ leverantörer, noll driftstopp.

 ---

+### 🆕 What's New in v2.7.0
+
+- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
+- **Multilingual intent detection** — routing scoring in 30+ languages
+- **Request deduplication** — prevent duplicate API calls via content hash
+- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
+- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
+
+---
+
 ### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP

 | Feature                                    | What It Does                                                                                                                                  |
@@ -11,6 +11,16 @@ _พร็อกซี API สากลของคุณ — จุดสิ้

 ---

+### 🆕 What's New in v2.7.0
+
+- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
+- **Multilingual intent detection** — routing scoring in 30+ languages
+- **Request deduplication** — prevent duplicate API calls via content hash
+- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
+- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
+
+---
+
 ### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP

 | Feature                                    | What It Does                                                                                                                                  |
@@ -11,6 +11,16 @@ _Ваш універсальний API-проксі — одна кінцева

 ---

+### 🆕 What's New in v2.7.0
+
+- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
+- **Multilingual intent detection** — routing scoring in 30+ languages
+- **Request deduplication** — prevent duplicate API calls via content hash
+- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
+- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
+
+---
+
 ### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP

 | Feature                                    | What It Does                                                                                                                                  |
@@ -11,6 +11,16 @@ _Proxy API phổ quát của bạn — một điểm cuối, hơn 36 nhà cung c

 ---

+### 🆕 What's New in v2.7.0
+
+- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
+- **Multilingual intent detection** — routing scoring in 30+ languages
+- **Request deduplication** — prevent duplicate API calls via content hash
+- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
+- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
+
+---
+
 ### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP

 | Feature                                    | What It Does                                                                                                                                  |
@@ -11,6 +11,16 @@ _您的通用 API 代理 — 一个端点，36+ 提供商，零停机时间。_

 ---

+### 🆕 v2.7.0 新功能
+
+- **可插拔 RouterStrategy** — 支持规则、成本和延迟策略
+- **多语言意图检测** — 支持 30+ 语言的路由评分
+- **请求去重** — 基于内容哈希避免重复 API 调用
+- **新增提供商：** Grok-4 Fast (xAI)、GLM-5 / Z.AI、MiniMax M2.5、Kimi K2.5
+- **价格更新：** Grok-4 Fast $0.20/$0.50/M，GLM-5 $0.50/M，MiniMax M2.5 $0.30/M
+
+---
+
 ### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP

 | Feature                                    | What It Does                                                                                                                                  |
@@ -1,7 +1,7 @@
 openapi: 3.1.0
 info:
  title: OmniRoute API
-  version: 2.6.6
+  version: 2.8.2
  description: |
    OmniRoute is a local-first AI API proxy router. It provides an OpenAI-compatible
    endpoint that routes requests to multiple AI providers with load balancing,
@@ -121,6 +121,10 @@ const nextConfig = {
        source: "/responses",
        destination: "/api/v1/responses",
      },
+      {
+        source: "/responses/:path*",
+        destination: "/api/v1/responses/:path*",
+      },
      {
        source: "/models",
        destination: "/api/v1/models",
@@ -11,7 +11,7 @@ interface AudioModel {
  name: string;
 }

-interface AudioProvider {
+export interface AudioProvider {
  id: string;
  baseUrl: string;
  authType: string;
@@ -262,36 +262,74 @@ export function getSpeechProvider(providerId: string): AudioProvider | null {
  return AUDIO_SPEECH_PROVIDERS[providerId] || null;
 }

+export interface ProviderNodeRow {
+  prefix: string;
+  name: string;
+  baseUrl: string;
+  apiType?: string;
+}
+
 /**
- * Parse audio model string (format: "provider/model" or just "model")
+ * Build a dynamic AudioProvider from a provider_node DB entry.
+ * Only used for local providers (localhost/127.0.0.1) — remote nodes are
+ * excluded by the caller to prevent auth bypass and SSRF.
 */
+export function buildDynamicAudioProvider(node: ProviderNodeRow, audioPath: string): AudioProvider {
+  if (!node.prefix || !node.baseUrl) {
+    throw new Error(`Invalid provider_node: missing prefix or baseUrl`);
+  }
+  const baseUrl = node.baseUrl.replace(/\/+$/, "");
+  return {
+    id: node.prefix,
+    baseUrl: `${baseUrl}${audioPath}`,
+    authType: "none",
+    authHeader: "none",
+    models: [],
+  };
+}
+
 function parseAudioModel(
  modelStr: string | null,
-  registry: Record<string, AudioProvider>
+  registry: Record<string, AudioProvider>,
+  dynamicProviders?: AudioProvider[]
 ): { provider: string | null; model: string | null } {
  if (!modelStr) return { provider: null, model: null };

-  for (const [providerId, config] of Object.entries(registry)) {
+  // Phase 1: prefix match in hardcoded registry
+  for (const [providerId] of Object.entries(registry)) {
    if (modelStr.startsWith(providerId + "/")) {
      return { provider: providerId, model: modelStr.slice(providerId.length + 1) };
    }
  }

+  // Phase 2: bare model lookup in hardcoded registry
  for (const [providerId, config] of Object.entries(registry)) {
    if (config.models.some((m) => m.id === modelStr)) {
      return { provider: providerId, model: modelStr };
    }
  }

+  // Phase 3: prefix match in dynamic providers (provider_nodes)
+  if (dynamicProviders) {
+    for (const dp of dynamicProviders) {
+      if (modelStr.startsWith(dp.id + "/")) {
+        return { provider: dp.id, model: modelStr.slice(dp.id.length + 1) };
+      }
+    }
+  }
+
  return { provider: null, model: modelStr };
 }

-export function parseTranscriptionModel(modelStr: string | null) {
-  return parseAudioModel(modelStr, AUDIO_TRANSCRIPTION_PROVIDERS);
+export function parseTranscriptionModel(
+  modelStr: string | null,
+  dynamicProviders?: AudioProvider[]
+) {
+  return parseAudioModel(modelStr, AUDIO_TRANSCRIPTION_PROVIDERS, dynamicProviders);
 }

-export function parseSpeechModel(modelStr: string | null) {
-  return parseAudioModel(modelStr, AUDIO_SPEECH_PROVIDERS);
+export function parseSpeechModel(modelStr: string | null, dynamicProviders?: AudioProvider[]) {
+  return parseAudioModel(modelStr, AUDIO_SPEECH_PROVIDERS, dynamicProviders);
 }

 /**
@@ -8,7 +8,43 @@
 * keyed by provider ID (e.g. "nebius", "openai").
 */

-export const EMBEDDING_PROVIDERS = {
+export interface EmbeddingProvider {
+  id: string;
+  baseUrl: string;
+  authType: string;
+  authHeader: string;
+  models: { id: string; name: string; dimensions?: number }[];
+}
+
+export interface EmbeddingProviderNodeRow {
+  prefix: string;
+  name: string;
+  baseUrl: string;
+  apiType?: string;
+}
+
+/**
+ * Build a dynamic EmbeddingProvider from a local provider_node.
+ * Only used for local providers (localhost) — caller must filter by hostname.
+ */
+export function buildDynamicEmbeddingProvider(node: EmbeddingProviderNodeRow): EmbeddingProvider {
+  if (!node.prefix || !node.baseUrl) {
+    throw new Error(`Invalid provider_node: missing prefix or baseUrl`);
+  }
+  if (node.prefix.includes("/") || node.prefix.includes(" ")) {
+    throw new Error(`Invalid provider_node prefix "${node.prefix}": must not contain / or spaces`);
+  }
+  const baseUrl = node.baseUrl.replace(/\/+$/, "");
+  return {
+    id: node.prefix,
+    baseUrl: `${baseUrl}/embeddings`,
+    authType: "none",
+    authHeader: "none",
+    models: [],
+  };
+}
+
+export const EMBEDDING_PROVIDERS: Record<string, EmbeddingProvider> = {
  nebius: {
    id: "nebius",
    baseUrl: "https://api.tokenfactory.nebius.com/v1/embeddings",
@@ -70,7 +106,7 @@ export const EMBEDDING_PROVIDERS = {
 /**
 * Get embedding provider config by ID
 */
-export function getEmbeddingProvider(providerId) {
+export function getEmbeddingProvider(providerId: string): EmbeddingProvider | null {
  return EMBEDDING_PROVIDERS[providerId] || null;
 }

@@ -78,26 +114,36 @@ export function getEmbeddingProvider(providerId) {
 * Parse embedding model string (format: "provider/model" or just "model")
 * Returns { provider, model }
 */
-export function parseEmbeddingModel(modelStr) {
+export function parseEmbeddingModel(
+  modelStr: string | null,
+  dynamicProviders?: EmbeddingProvider[]
+): { provider: string | null; model: string | null } {
  if (!modelStr) return { provider: null, model: null };

  // Check for "provider/model" format
  const slashIdx = modelStr.indexOf("/");
  if (slashIdx > 0) {
-    // Handle nested model IDs like "nebius/Qwen/Qwen3-Embedding-8B"
-    // Try each provider prefix
-    for (const [providerId, config] of Object.entries(EMBEDDING_PROVIDERS)) {
+    // Phase 1: Try each hardcoded provider prefix
+    for (const [providerId] of Object.entries(EMBEDDING_PROVIDERS)) {
      if (modelStr.startsWith(providerId + "/")) {
        return { provider: providerId, model: modelStr.slice(providerId.length + 1) };
      }
    }
-    // Fallback: first segment is provider
+    // Phase 2: Try dynamic provider_nodes prefix
+    if (dynamicProviders) {
+      for (const dp of dynamicProviders) {
+        if (modelStr.startsWith(dp.id + "/")) {
+          return { provider: dp.id, model: modelStr.slice(dp.id.length + 1) };
+        }
+      }
+    }
+    // Phase 3: Fallback — first segment is provider
    const provider = modelStr.slice(0, slashIdx);
    const model = modelStr.slice(slashIdx + 1);
    return { provider, model };
  }

-  // No provider prefix — search all providers for the model
+  // No provider prefix — search hardcoded providers for the model
  for (const [providerId, config] of Object.entries(EMBEDDING_PROVIDERS)) {
    if (config.models.some((m) => m.id === modelStr)) {
      return { provider: providerId, model: modelStr };
@@ -11,6 +11,7 @@
 export interface RegistryModel {
  id: string;
  name: string;
+  toolCalling?: boolean;
  targetFormat?: string;
  unsupportedParams?: readonly string[];
 }
@@ -77,6 +78,22 @@ interface LegacyProvider {
  clientVersion?: string;
 }

+const KIMI_CODING_SHARED = {
+  format: "claude",
+  executor: "default",
+  baseUrl: "https://api.kimi.com/coding/v1/messages",
+  authHeader: "x-api-key",
+  headers: {
+    "Anthropic-Version": "2023-06-01",
+    "Anthropic-Beta": "claude-code-20250219,interleaved-thinking-2025-05-14",
+  },
+  models: [
+    { id: "kimi-k2.5", name: "Kimi K2.5" },
+    { id: "kimi-k2.5-thinking", name: "Kimi K2.5 Thinking" },
+    { id: "kimi-latest", name: "Kimi Latest" },
+  ] as RegistryModel[],
+} as const;
+
 // ── Registry ──────────────────────────────────────────────────────────────

 export const REGISTRY: Record<string, RegistryEntry> = {
@@ -114,6 +131,7 @@ export const REGISTRY: Record<string, RegistryEntry> = {
    },
    models: [
      { id: "claude-opus-4-6", name: "Claude Opus 4.6" },
+      { id: "claude-sonnet-4-6", name: "Claude 4.6 Sonnet" },
      { id: "claude-opus-4-5-20251101", name: "Claude 4.5 Opus" },
      { id: "claude-sonnet-4-5-20250929", name: "Claude 4.5 Sonnet" },
      { id: "claude-haiku-4-5-20251001", name: "Claude 4.5 Haiku" },
@@ -139,6 +157,9 @@ export const REGISTRY: Record<string, RegistryEntry> = {
      clientSecretDefault: "",
    },
    models: [
+      { id: "gemini-3.1-pro", name: "Gemini 3.1 Pro" },
+      { id: "gemini-3-1-pro", name: "Gemini 3.1 Pro (Alt ID)" },
+      { id: "gemini-3.1-pro-preview", name: "Gemini 3.1 Pro Preview" },
      { id: "gemini-2.5-pro", name: "Gemini 2.5 Pro" },
      { id: "gemini-2.5-flash", name: "Gemini 2.5 Flash" },
      { id: "gemini-2.5-flash-lite", name: "Gemini 2.5 Flash Lite" },
@@ -168,6 +189,9 @@ export const REGISTRY: Record<string, RegistryEntry> = {
      clientSecretDefault: "",
    },
    models: [
+      { id: "gemini-3.1-pro", name: "Gemini 3.1 Pro" },
+      { id: "gemini-3-1-pro", name: "Gemini 3.1 Pro (Alt ID)" },
+      { id: "gemini-3.1-pro-preview", name: "Gemini 3.1 Pro Preview" },
      { id: "gemini-2.5-pro", name: "Gemini 2.5 Pro" },
      { id: "gemini-2.5-flash", name: "Gemini 2.5 Flash" },
      { id: "gemini-2.5-flash-lite", name: "Gemini 2.5 Flash Lite" },
@@ -460,8 +484,13 @@ export const REGISTRY: Record<string, RegistryEntry> = {
      "Anthropic-Version": "2023-06-01",
    },
    models: [
+      { id: "claude-haiku-4.5", name: "Claude Haiku 4.5" },
      { id: "claude-sonnet-4-20250514", name: "Claude Sonnet 4" },
+      { id: "claude-sonnet-4-6-20251031", name: "Claude Sonnet 4.6 (Dated)" },
+      { id: "claude-sonnet-4.6", name: "Claude Sonnet 4.6" },
      { id: "claude-opus-4-20250514", name: "Claude Opus 4" },
+      { id: "claude-opus-4-6-20251031", name: "Claude Opus 4.6 (Dated)" },
+      { id: "claude-opus-4.6", name: "Claude Opus 4.6" },
      { id: "claude-3-5-sonnet-20241022", name: "Claude 3.5 Sonnet" },
    ],
  },
@@ -495,6 +524,8 @@ export const REGISTRY: Record<string, RegistryEntry> = {
      "Anthropic-Beta": "claude-code-20250219,interleaved-thinking-2025-05-14",
    },
    models: [
+      { id: "glm-5", name: "GLM 5" },
+      { id: "glm-5-turbo", name: "GLM 5 Turbo" },
      { id: "glm-4.7-flash", name: "GLM 4.7 Flash" },
      { id: "glm-4.7", name: "GLM 4.7" },
      { id: "glm-4.6v", name: "GLM 4.6V (Vision)" },
@@ -506,6 +537,51 @@ export const REGISTRY: Record<string, RegistryEntry> = {
    ],
  },

+  "bailian-coding-plan": {
+    id: "bailian-coding-plan",
+    alias: "bcp",
+    format: "claude",
+    executor: "default",
+    baseUrl: "https://coding-intl.dashscope.aliyuncs.com/apps/anthropic/v1/messages",
+    chatPath: "/messages",
+    urlSuffix: "?beta=true",
+    authType: "apikey",
+    authHeader: "x-api-key",
+    headers: {
+      "Anthropic-Version": "2023-06-01",
+      "Anthropic-Beta": "claude-code-20250219,interleaved-thinking-2025-05-14",
+    },
+    models: [
+      { id: "qwen3.5-plus", name: "Qwen3.5 Plus" },
+      { id: "qwen3-max-2026-01-23", name: "Qwen3 Max (2026-01-23)" },
+      { id: "qwen3-coder-next", name: "Qwen3 Coder Next" },
+      { id: "qwen3-coder-plus", name: "Qwen3 Coder Plus" },
+      { id: "MiniMax-M2.5", name: "MiniMax M2.5" },
+      { id: "glm-5", name: "GLM 5" },
+      { id: "glm-4.7", name: "GLM 4.7" },
+      { id: "kimi-k2.5", name: "Kimi K2.5" },
+    ],
+  },
+
+  zai: {
+    id: "zai",
+    alias: "zai",
+    format: "claude",
+    executor: "default",
+    baseUrl: "https://api.z.ai/api/anthropic/v1/messages",
+    urlSuffix: "?beta=true",
+    authType: "apikey",
+    authHeader: "x-api-key",
+    headers: {
+      "Anthropic-Version": "2023-06-01",
+      "Anthropic-Beta": "claude-code-20250219,interleaved-thinking-2025-05-14",
+    },
+    models: [
+      { id: "glm-5", name: "GLM 5" },
+      { id: "glm-5-turbo", name: "GLM 5 Turbo" },
+    ],
+  },
+
  kimi: {
    id: "kimi",
    alias: "kimi",
@@ -525,16 +601,9 @@ export const REGISTRY: Record<string, RegistryEntry> = {
  "kimi-coding": {
    id: "kimi-coding",
    alias: "kmc",
-    format: "claude",
-    executor: "default",
-    baseUrl: "https://api.kimi.com/coding/v1/messages",
+    ...KIMI_CODING_SHARED,
    urlSuffix: "?beta=true",
    authType: "oauth",
-    authHeader: "x-api-key",
-    headers: {
-      "Anthropic-Version": "2023-06-01",
-      "Anthropic-Beta": "claude-code-20250219,interleaved-thinking-2025-05-14",
-    },
    oauth: {
      clientIdEnv: "KIMI_CODING_OAUTH_CLIENT_ID",
      clientIdDefault: "17e5f671-d194-4dfb-9706-5516cb48c098",
@@ -542,11 +611,13 @@ export const REGISTRY: Record<string, RegistryEntry> = {
      refreshUrl: "https://auth.kimi.com/api/oauth/token",
      authUrl: "https://auth.kimi.com/api/oauth/device_authorization",
    },
-    models: [
-      { id: "kimi-k2.5", name: "Kimi K2.5" },
-      { id: "kimi-k2.5-thinking", name: "Kimi K2.5 Thinking" },
-      { id: "kimi-latest", name: "Kimi Latest" },
-    ],
+  },
+
+  "kimi-coding-apikey": {
+    id: "kimi-coding-apikey",
+    alias: "kmca",
+    ...KIMI_CODING_SHARED,
+    authType: "apikey",
  },

  kilocode: {
@@ -637,7 +708,11 @@ export const REGISTRY: Record<string, RegistryEntry> = {
      "Anthropic-Version": "2023-06-01",
      "Anthropic-Beta": "claude-code-20250219,interleaved-thinking-2025-05-14",
    },
-    models: [{ id: "MiniMax-M2.1", name: "MiniMax M2.1" }],
+    models: [
+      { id: "minimax-m2.5", name: "MiniMax M2.5" },
+      { id: "MiniMax-M2.5", name: "MiniMax M2.5 (Legacy Alias)" },
+      { id: "MiniMax-M2.1", name: "MiniMax M2.1" },
+    ],
  },

  "minimax-cn": {
@@ -655,10 +730,52 @@ export const REGISTRY: Record<string, RegistryEntry> = {
    },
    models: [
      // Keep parity with minimax to ensure model discovery works for minimax-cn connections.
+      { id: "minimax-m2.5", name: "MiniMax M2.5" },
+      { id: "MiniMax-M2.5", name: "MiniMax M2.5 (Legacy Alias)" },
      { id: "MiniMax-M2.1", name: "MiniMax M2.1" },
    ],
  },

+  alicode: {
+    id: "alicode",
+    alias: "alicode",
+    format: "openai",
+    executor: "default",
+    baseUrl: "https://coding.dashscope.aliyuncs.com/v1/chat/completions",
+    authType: "apikey",
+    authHeader: "bearer",
+    models: [
+      { id: "qwen3.5-plus", name: "Qwen3.5 Plus" },
+      { id: "kimi-k2.5", name: "Kimi K2.5" },
+      { id: "glm-5", name: "GLM 5" },
+      { id: "MiniMax-M2.5", name: "MiniMax M2.5" },
+      { id: "qwen3-max-2026-01-23", name: "Qwen3 Max" },
+      { id: "qwen3-coder-next", name: "Qwen3 Coder Next" },
+      { id: "qwen3-coder-plus", name: "Qwen3 Coder Plus" },
+      { id: "glm-4.7", name: "GLM 4.7" },
+    ],
+  },
+
+  "alicode-intl": {
+    id: "alicode-intl",
+    alias: "alicode-intl",
+    format: "openai",
+    executor: "default",
+    baseUrl: "https://coding-intl.dashscope.aliyuncs.com/v1/chat/completions",
+    authType: "apikey",
+    authHeader: "bearer",
+    models: [
+      { id: "qwen3.5-plus", name: "Qwen3.5 Plus" },
+      { id: "kimi-k2.5", name: "Kimi K2.5" },
+      { id: "glm-5", name: "GLM 5" },
+      { id: "MiniMax-M2.5", name: "MiniMax M2.5" },
+      { id: "qwen3-max-2026-01-23", name: "Qwen3 Max" },
+      { id: "qwen3-coder-next", name: "Qwen3 Coder Next" },
+      { id: "qwen3-coder-plus", name: "Qwen3 Coder Plus" },
+      { id: "glm-4.7", name: "GLM 4.7" },
+    ],
+  },
+
  deepseek: {
    id: "deepseek",
    alias: "ds",
@@ -717,10 +834,14 @@ export const REGISTRY: Record<string, RegistryEntry> = {
    authType: "apikey",
    authHeader: "bearer",
    models: [
-      { id: "grok-4", name: "Grok 4" },
+      { id: "grok-4-fast-non-reasoning", name: "Grok 4 Fast" },
      { id: "grok-4-fast-reasoning", name: "Grok 4 Fast Reasoning" },
-      { id: "grok-code-fast-1", name: "Grok Code Fast" },
+      { id: "grok-4-1-fast-non-reasoning", name: "Grok 4.1 Fast" },
+      { id: "grok-4-1-fast-reasoning", name: "Grok 4.1 Fast Reasoning" },
+      { id: "grok-4-0709", name: "Grok 4 (0709)" },
+      { id: "grok-4", name: "Grok 4" },
      { id: "grok-3", name: "Grok 3" },
+      { id: "grok-3-mini", name: "Grok 3 Mini" },
    ],
  },

@@ -849,7 +970,10 @@ export const REGISTRY: Record<string, RegistryEntry> = {
    authType: "apikey",
    authHeader: "bearer",
    models: [
+      { id: "gpt-oss-120b", name: "GPT OSS 120B", toolCalling: false },
+      { id: "openai/gpt-oss-120b", name: "GPT OSS 120B (OpenAI Prefix)", toolCalling: false },
      { id: "meta/llama-3.3-70b-instruct", name: "Llama 3.3 70B" },
+      { id: "nvidia/llama-3.3-70b-instruct", name: "Llama 3.3 70B (NVIDIA Prefix)" },
      { id: "meta/llama-4-maverick-17b-128e-instruct", name: "Llama 4 Maverick" },
      { id: "moonshotai/kimi-k2.5", name: "Kimi K2.5" },
      { id: "z-ai/glm4.7", name: "GLM 4.7" },
@@ -0,0 +1,155 @@
+/**
+ * Search Provider Registry
+ *
+ * Defines providers that support the /v1/search endpoint.
+ * Unlike LLM/embedding providers, search providers don't have "models" —
+ * a provider IS the model (Serper = Google SERP, Brave = Brave index).
+ *
+ * API keys are stored in the same provider credentials system,
+ * keyed by provider ID (e.g. "serper-search", "brave-search").
+ * perplexity-search reuses credentials from the "perplexity" chat provider.
+ */
+
+export interface SearchProviderConfig {
+  id: string;
+  name: string;
+  baseUrl: string;
+  method: "GET" | "POST";
+  authType: "apikey";
+  authHeader: string;
+  costPerQuery: number;
+  freeMonthlyQuota: number;
+  searchTypes: string[];
+  defaultMaxResults: number;
+  maxMaxResults: number;
+  timeoutMs: number;
+  cacheTTLMs: number;
+}
+
+export const SEARCH_PROVIDERS: Record<string, SearchProviderConfig> = {
+  "serper-search": {
+    id: "serper-search",
+    name: "Serper Search",
+    baseUrl: "https://google.serper.dev",
+    method: "POST",
+    authType: "apikey",
+    authHeader: "x-api-key",
+    costPerQuery: 0.001,
+    freeMonthlyQuota: 2500,
+    searchTypes: ["web", "news"],
+    defaultMaxResults: 5,
+    maxMaxResults: 100,
+    timeoutMs: 10_000,
+    cacheTTLMs: 5 * 60 * 1000,
+  },
+
+  "brave-search": {
+    id: "brave-search",
+    name: "Brave Search",
+    baseUrl: "https://api.search.brave.com/res/v1",
+    method: "GET",
+    authType: "apikey",
+    authHeader: "x-subscription-token",
+    costPerQuery: 0.005,
+    freeMonthlyQuota: 1000,
+    searchTypes: ["web", "news"],
+    defaultMaxResults: 5,
+    maxMaxResults: 20,
+    timeoutMs: 10_000,
+    cacheTTLMs: 5 * 60 * 1000,
+  },
+
+  "perplexity-search": {
+    id: "perplexity-search",
+    name: "Perplexity Search",
+    baseUrl: "https://api.perplexity.ai/search",
+    method: "POST",
+    authType: "apikey",
+    authHeader: "bearer",
+    costPerQuery: 0.005,
+    freeMonthlyQuota: 0,
+    searchTypes: ["web"],
+    defaultMaxResults: 5,
+    maxMaxResults: 20,
+    timeoutMs: 10_000,
+    cacheTTLMs: 5 * 60 * 1000,
+  },
+
+  "exa-search": {
+    id: "exa-search",
+    name: "Exa Search",
+    baseUrl: "https://api.exa.ai/search",
+    method: "POST",
+    authType: "apikey",
+    authHeader: "x-api-key",
+    costPerQuery: 0.007,
+    freeMonthlyQuota: 1000,
+    searchTypes: ["web", "news"],
+    defaultMaxResults: 5,
+    maxMaxResults: 100,
+    timeoutMs: 10_000,
+    cacheTTLMs: 5 * 60 * 1000,
+  },
+
+  "tavily-search": {
+    id: "tavily-search",
+    name: "Tavily Search",
+    baseUrl: "https://api.tavily.com/search",
+    method: "POST",
+    authType: "apikey",
+    authHeader: "bearer",
+    costPerQuery: 0.008,
+    freeMonthlyQuota: 1000,
+    searchTypes: ["web", "news"],
+    defaultMaxResults: 5,
+    maxMaxResults: 20,
+    timeoutMs: 10_000,
+    cacheTTLMs: 5 * 60 * 1000,
+  },
+};
+
+/**
+ * Credential fallback mapping — search providers that can reuse credentials
+ * from a related provider (e.g., perplexity-search uses the same API key as perplexity chat).
+ */
+export const SEARCH_CREDENTIAL_FALLBACKS: Record<string, string> = {
+  "perplexity-search": "perplexity",
+};
+
+/**
+ * Get search provider config by ID
+ */
+export function getSearchProvider(providerId: string): SearchProviderConfig | null {
+  return SEARCH_PROVIDERS[providerId] || null;
+}
+
+/**
+ * Get all search providers as a flat list
+ */
+export function getAllSearchProviders(): Array<{
+  id: string;
+  name: string;
+  searchTypes: string[];
+}> {
+  return Object.values(SEARCH_PROVIDERS).map((p) => ({
+    id: p.id,
+    name: p.name,
+    searchTypes: p.searchTypes,
+  }));
+}
+
+/**
+ * Select the cheapest available provider.
+ * If an explicit provider is given, validate and return it.
+ * Otherwise, return the cheapest by costPerQuery.
+ */
+export function selectProvider(explicitProvider?: string): SearchProviderConfig | null {
+  if (explicitProvider) {
+    return SEARCH_PROVIDERS[explicitProvider] || null;
+  }
+
+  const providers = Object.values(SEARCH_PROVIDERS);
+  if (providers.length === 0) return null;
+
+  return providers.reduce((cheapest, p) => (p.costPerQuery < cheapest.costPerQuery ? p : cheapest));
+}
@@ -26,6 +26,7 @@ export type ProviderCredentials = {
  expiresAt?: string;
  connectionId?: string; // T07: used for API key rotation index
  providerSpecificData?: JsonRecord;
+  requestEndpointPath?: string;
 };

 export type ExecutorLog = {
@@ -9,6 +9,17 @@ type EffortLevel = (typeof EFFORT_ORDER)[number];
 const CODEX_FAST_WIRE_VALUE = "priority";
 let defaultFastServiceTierEnabled = false;

+function getResponsesSubpath(endpointPath: unknown): string | null {
+  const normalizedEndpoint = String(endpointPath || "").replace(/\/+$/, "");
+  const match = normalizedEndpoint.match(/(?:^|\/)responses(?:(\/.*))?$/i);
+  if (!match) return null;
+  return match[1] || "";
+}
+
+function isCompactResponsesEndpoint(endpointPath: unknown): boolean {
+  return getResponsesSubpath(endpointPath)?.toLowerCase() === "/compact";
+}
+
 function normalizeServiceTierValue(value: unknown): string | undefined {
  if (typeof value !== "string") return undefined;
  const normalized = value.trim().toLowerCase();
@@ -60,13 +71,31 @@ export class CodexExecutor extends BaseExecutor {
    super("codex", PROVIDERS.codex);
  }

+  buildUrl(model, stream, urlIndex = 0, credentials = null) {
+    void model;
+    void stream;
+    void urlIndex;
+
+    const responsesSubpath = getResponsesSubpath(credentials?.requestEndpointPath);
+    if (responsesSubpath !== null) {
+      const baseUrl = String(this.config.baseUrl || "").replace(/\/$/, "");
+      if (baseUrl.endsWith("/responses")) {
+        return `${baseUrl}${responsesSubpath}`;
+      }
+      return `${baseUrl}/responses${responsesSubpath}`;
+    }
+
+    return super.buildUrl(model, stream, urlIndex, credentials);
+  }
+
  /**
   * Codex Responses endpoint is SSE-first.
   * Always request event-stream from upstream, even when client requested stream=false.
   * Includes chatgpt-account-id header for strict workspace binding.
   */
  buildHeaders(credentials, stream = true) {
-    const headers = super.buildHeaders(credentials, true);
+    const isCompactRequest = isCompactResponsesEndpoint(credentials?.requestEndpointPath);
+    const headers = super.buildHeaders(credentials, isCompactRequest ? false : true);

    // Add workspace binding header if workspaceId is persisted
    const workspaceId = credentials?.providerSpecificData?.workspaceId;
@@ -107,9 +136,15 @@ export class CodexExecutor extends BaseExecutor {
   */
  transformRequest(model, body, stream, credentials) {
    const nativeCodexPassthrough = body?._nativeCodexPassthrough === true;
+    const isCompactRequest = isCompactResponsesEndpoint(credentials?.requestEndpointPath);

-    // Codex /responses rejects stream=false; we aggregate SSE back to JSON when needed.
-    body.stream = true;
+    // Codex /responses rejects stream=false, but /responses/compact rejects the stream field entirely.
+    if (isCompactRequest) {
+      delete body.stream;
+      delete body.stream_options;
+    } else {
+      body.stream = true;
+    }
    delete body._nativeCodexPassthrough;

    const requestServiceTier = normalizeServiceTierValue(body.service_tier);
@@ -54,6 +54,8 @@ export class DefaultExecutor extends BaseExecutor {
        break;
      case "glm":
      case "kimi-coding":
+      case "bailian-coding-plan":
+      case "kimi-coding-apikey":
      case "minimax":
      case "minimax-cn":
        headers["x-api-key"] = credentials.apiKey || credentials.accessToken;
@@ -77,10 +77,13 @@ export class KiroExecutor extends BaseExecutor {
  }

  transformRequest(model: string, body: unknown, stream: boolean, credentials: unknown): unknown {
-    void model;
    void stream;
    void credentials;
-    return body;
+    // Kiro uses conversationState.currentMessage.userInputMessage.modelId,
+    // not a top-level "model" field. chatCore injects translatedBody.model
+    // which Kiro API rejects as unknown top-level field.
+    const { model: _model, ...rest } = body as Record<string, unknown>;
+    return rest;
  }

  /**
@@ -381,7 +381,12 @@ async function handleTortoiseSpeech(providerConfig, body) {
 * @returns {Response}
 */
 /** @returns {Promise<unknown>} */
-export async function handleAudioSpeech({ body, credentials }) {
+export async function handleAudioSpeech({
+  body,
+  credentials,
+  resolvedProvider = null,
+  resolvedModel = null,
+}) {
  if (!body.model) {
    return errorResponse(400, "model is required");
  }
@@ -389,8 +394,15 @@ export async function handleAudioSpeech({ body, credentials }) {
    return errorResponse(400, "input is required");
  }

-  const { provider: providerId, model: modelId } = parseSpeechModel(body.model);
-  const providerConfig = providerId ? getSpeechProvider(providerId) : null;
+  // Use pre-resolved provider/model from route handler if available (supports dynamic provider_nodes).
+  // Falls back to hardcoded registry lookup for backward compatibility.
+  let providerConfig = resolvedProvider;
+  let modelId = resolvedModel;
+  if (!providerConfig) {
+    const parsed = parseSpeechModel(body.model);
+    providerConfig = parsed.provider ? getSpeechProvider(parsed.provider) : null;
+    modelId = parsed.model;
+  }

  if (!providerConfig) {
    return errorResponse(
@@ -403,7 +415,7 @@ export async function handleAudioSpeech({ body, credentials }) {
  const token =
    providerConfig.authType === "none" ? null : credentials?.apiKey || credentials?.accessToken;
  if (providerConfig.authType !== "none" && !token) {
-    return errorResponse(401, `No credentials for speech provider: ${providerId}`);
+    return errorResponse(401, `No credentials for speech provider: ${providerConfig.id}`);
  }

  try {
@@ -13,7 +13,11 @@ import { getCorsOrigin } from "../utils/cors.ts";
 * - HuggingFace Inference: POST raw binary to /models/{model_id}
 */

-import { getTranscriptionProvider, parseTranscriptionModel } from "../config/audioRegistry.ts";
+import {
+  getTranscriptionProvider,
+  parseTranscriptionModel,
+  type AudioProvider,
+} from "../config/audioRegistry.ts";
 import { buildAuthHeaders } from "../config/registryUtils.ts";
 import { errorResponse } from "../utils/error.ts";

@@ -235,9 +239,13 @@ async function handleHuggingFaceTranscription(providerConfig, file, modelId, tok
 export async function handleAudioTranscription({
  formData,
  credentials,
+  resolvedProvider = null,
+  resolvedModel = null,
 }: {
  formData: FormData;
  credentials?: TranscriptionCredentials | null;
+  resolvedProvider?: AudioProvider | null;
+  resolvedModel?: string | null;
 }): Promise<Response> {
  const model = formData.get("model");
  if (typeof model !== "string" || !model) {
@@ -250,8 +258,14 @@ export async function handleAudioTranscription({
  }
  const file = fileEntry as Blob & { name?: unknown };

-  const { provider: providerId, model: modelId } = parseTranscriptionModel(model);
-  const providerConfig = providerId ? getTranscriptionProvider(providerId) : null;
+  // Use pre-resolved provider/model from route handler if available (supports dynamic provider_nodes).
+  let providerConfig = resolvedProvider;
+  let modelId = resolvedModel;
+  if (!providerConfig) {
+    const parsed = parseTranscriptionModel(model);
+    providerConfig = parsed.provider ? getTranscriptionProvider(parsed.provider) : null;
+    modelId = parsed.model;
+  }

  if (!providerConfig) {
    return errorResponse(
@@ -264,7 +278,7 @@ export async function handleAudioTranscription({
  const token =
    providerConfig.authType === "none" ? null : credentials?.apiKey || credentials?.accessToken;
  if (providerConfig.authType !== "none" && !token) {
-    return errorResponse(401, `No credentials for transcription provider: ${providerId}`);
+    return errorResponse(401, `No credentials for transcription provider: ${providerConfig.id}`);
  }

  // Route to provider-specific handler
@@ -23,6 +23,7 @@ import {
  appendRequestLog,
  saveCallLog,
 } from "@/lib/usageDb";
+import { getModelNormalizeToolCallId } from "@/lib/db/models";
 import { getExecutor } from "../executors/index.ts";
 import { translateNonStreamingResponse } from "./responseTranslator.ts";
 import { extractUsageFromResponse } from "./usageExtractor.ts";
@@ -42,6 +43,12 @@ import {
 import { getIdempotencyKey, checkIdempotency, saveIdempotency } from "@/lib/idempotencyLayer";
 import { createProgressTransform, wantsProgress } from "../utils/progressTracker.ts";
 import { isModelUnavailableError, getNextFamilyFallback } from "../services/modelFamilyFallback.ts";
+import { computeRequestHash, deduplicate, shouldDeduplicate } from "../services/requestDedup.ts";
+import {
+  shouldUseFallback,
+  isFallbackDecision,
+  EMERGENCY_FALLBACK_CONFIG,
+} from "../services/emergencyFallback.ts";

 export function shouldUseNativeCodexPassthrough({
  provider,
@@ -54,9 +61,8 @@ export function shouldUseNativeCodexPassthrough({
 }): boolean {
  if (provider !== "codex") return false;
  if (sourceFormat !== FORMATS.OPENAI_RESPONSES) return false;
-  return String(endpointPath || "")
-    .toLowerCase()
-    .endsWith("/responses");
+  const normalizedEndpoint = String(endpointPath || "").replace(/\/+$/, "");
+  return /(?:^|\/)responses(?:\/.*)?$/i.test(normalizedEndpoint);
 }

 /**
@@ -89,6 +95,22 @@ export async function handleChatCore({
 }) {
  const { provider, model, extendedContext } = modelInfo;
  const startTime = Date.now();
+  const persistFailureUsage = (statusCode: number, errorCode?: string | null) => {
+    saveRequestUsage({
+      provider: provider || "unknown",
+      model: model || "unknown",
+      tokens: { input: 0, output: 0, cacheRead: 0, cacheCreation: 0, reasoning: 0 },
+      status: String(statusCode),
+      success: false,
+      latencyMs: Date.now() - startTime,
+      timeToFirstTokenMs: 0,
+      errorCode: errorCode || String(statusCode),
+      timestamp: new Date().toISOString(),
+      connectionId: connectionId || undefined,
+      apiKeyId: apiKeyInfo?.id || undefined,
+      apiKeyName: apiKeyInfo?.name || undefined,
+    }).catch(() => {});
+  };

  // ── Phase 9.2: Idempotency check ──
  const idempotencyKey = getIdempotencyKey(clientRawRequest?.headers);
@@ -118,8 +140,8 @@ export async function handleChatCore({
  }

  const sourceFormat = detectFormat(body);
-  const endpointPath = (clientRawRequest?.endpoint || "").toLowerCase();
-  const isResponsesEndpoint = endpointPath.endsWith("/responses");
+  const endpointPath = String(clientRawRequest?.endpoint || "");
+  const isResponsesEndpoint = /(?:^|\/)responses(?:\/.*)?$/i.test(endpointPath);
  const nativeCodexPassthrough = shouldUseNativeCodexPassthrough({
    provider,
    sourceFormat,
@@ -135,10 +157,16 @@ export async function handleChatCore({
  // Detect source format and get target format
  // Model-specific targetFormat takes priority over provider default

-  // Apply custom model aliases (Settings → Model Aliases → Pattern→Target) before routing (#315)
+  // Apply custom model aliases (Settings → Model Aliases → Pattern→Target) before routing (#315, #472)
  // Custom aliases take priority over built-in and must be resolved here so the
-  // downstream getModelTargetFormat() lookup uses the correct, aliased model ID.
+  // downstream getModelTargetFormat() lookup AND the actual provider request use
+  // the correct, aliased model ID. Without this, aliases only affect format detection.
  const resolvedModel = resolveModelAlias(model);
+  // Use resolvedModel for all downstream operations (routing, provider requests, logging)
+  const effectiveModel = resolvedModel !== model ? resolvedModel : model;
+  if (resolvedModel !== model) {
+    log?.info?.("ALIAS", `Model alias applied: ${model} → ${resolvedModel}`);
+  }

  const alias = PROVIDER_ID_TO_ALIAS[provider] || provider;
  const modelTargetFormat = getModelTargetFormat(alias, resolvedModel);
@@ -185,10 +213,17 @@ export async function handleChatCore({

  // Translate request (pass reqLogger for intermediate logging)
  let translatedBody = body;
+  const isClaudePassthrough = sourceFormat === FORMATS.CLAUDE && targetFormat === FORMATS.CLAUDE;
  try {
    if (nativeCodexPassthrough) {
      translatedBody = { ...body, _nativeCodexPassthrough: true };
      log?.debug?.("FORMAT", "native codex passthrough enabled");
+    } else if (isClaudePassthrough) {
+      // Claude-to-Claude passthrough: forward body completely untouched.
+      // No translation, no field stripping, no thinking normalization.
+      // We are just a gateway -- do not interfere with the request in the slightest.
+      translatedBody = { ...body };
+      log?.debug?.("FORMAT", "claude->claude passthrough -- forwarding untouched");
    } else {
      translatedBody = { ...body };

@@ -233,6 +268,56 @@ export async function handleChatCore({
        });
      }

+      // Strip empty text content blocks from messages.
+      // Anthropic API rejects {"type":"text","text":""} with 400 "text content blocks must be non-empty".
+      // Some clients (LiteLLM passthrough, @ai-sdk/anthropic) may forward these empty blocks as-is.
+      if (Array.isArray(translatedBody.messages)) {
+        for (const msg of translatedBody.messages) {
+          if (Array.isArray(msg.content)) {
+            msg.content = msg.content.filter(
+              (block: Record<string, unknown>) =>
+                block.type !== "text" || (typeof block.text === "string" && block.text.length > 0)
+            );
+          }
+        }
+      }
+
+      // ── #409: Normalize unsupported content part types ──
+      // Cursor and other clients send {type:"file"} when attaching .md or other files.
+      // Providers (Copilot, OpenAI) only accept "text" and "image_url" in content arrays.
+      // Convert: file → text (extract content), drop unrecognized types with a warning.
+      if (Array.isArray(translatedBody.messages)) {
+        for (const msg of translatedBody.messages) {
+          if (msg.role === "user" && Array.isArray(msg.content)) {
+            msg.content = (msg.content as Record<string, unknown>[]).flatMap(
+              (block: Record<string, unknown>) => {
+                if (block.type === "text" || block.type === "image_url" || block.type === "image") {
+                  return [block];
+                }
+                // file / document → extract text content
+                if (block.type === "file" || block.type === "document") {
+                  const fileContent =
+                    (block.file as Record<string, unknown>)?.content ??
+                    (block.file as Record<string, unknown>)?.text ??
+                    block.content ??
+                    block.text;
+                  const fileName =
+                    (block.file as Record<string, unknown>)?.name ?? block.name ?? "attachment";
+                  if (typeof fileContent === "string" && fileContent.length > 0) {
+                    return [{ type: "text", text: `[${fileName}]\n${fileContent}` }];
+                  }
+                  return [];
+                }
+                // Unknown types: drop silently
+                log?.debug?.("CONTENT", `Dropped unsupported content part type="${block.type}"`);
+                return [];
+              }
+            );
+          }
+        }
+      }
+
+      const normalizeToolCallId = getModelNormalizeToolCallId(provider || "", model || "");
      translatedBody = translateRequest(
        sourceFormat,
        targetFormat,
@@ -241,7 +326,8 @@ export async function handleChatCore({
        stream,
        credentials,
        provider,
-        reqLogger
+        reqLogger,
+        { normalizeToolCallId }
      );
    }
  } catch (error) {
@@ -287,8 +373,8 @@ export async function handleChatCore({
  delete translatedBody._toolNameMap;
  delete translatedBody._disableToolPrefix;

-  // Update model in body
-  translatedBody.model = model;
+  // Update model in body — use resolved alias so the provider gets the correct model ID (#472)
+  translatedBody.model = effectiveModel;

  // Strip unsupported parameters for reasoning models (o1, o3, etc.)
  const unsupported = getUnsupportedParams(provider, model);
@@ -307,13 +393,66 @@ export async function handleChatCore({

  // Get executor for this provider
  const executor = getExecutor(provider);
+  const getExecutionCredentials = () =>
+    nativeCodexPassthrough ? { ...credentials, requestEndpointPath: endpointPath } : credentials;
+
+  // Create stream controller for disconnect detection
+  const streamController = createStreamController({ onDisconnect, log, provider, model });
+
+  const dedupRequestBody = { ...translatedBody, model: `${provider}/${model}` };
+  const dedupEnabled = shouldDeduplicate(dedupRequestBody);
+  const dedupHash = dedupEnabled ? computeRequestHash(dedupRequestBody) : null;
+
+  const executeProviderRequest = async (modelToCall = effectiveModel, allowDedup = false) => {
+    const execute = async () => {
+      const bodyToSend =
+        translatedBody.model === modelToCall
+          ? translatedBody
+          : { ...translatedBody, model: modelToCall };
+
+      const rawResult = await withRateLimit(provider, connectionId, modelToCall, () =>
+        executor.execute({
+          model: modelToCall,
+          body: bodyToSend,
+          stream,
+          credentials: getExecutionCredentials(),
+          signal: streamController.signal,
+          log,
+          extendedContext,
+        })
+      );
+
+      if (stream) return rawResult;
+
+      // Non-stream responses need cloning for shared dedup consumers.
+      const status = rawResult.response.status;
+      const statusText = rawResult.response.statusText;
+      const headers = Array.from(rawResult.response.headers.entries());
+      const payload = await rawResult.response.text();
+
+      return {
+        ...rawResult,
+        response: new Response(payload, { status, statusText, headers }),
+      };
+    };
+
+    if (allowDedup && dedupEnabled && dedupHash) {
+      const dedupResult = await deduplicate(dedupHash, execute);
+      if (dedupResult.wasDeduplicated) {
+        log?.debug?.("DEDUP", `Joined in-flight request hash=${dedupHash}`);
+      }
+      return dedupResult.result;
+    }
+
+    return execute();
+  };

  // Track pending request
  trackPendingRequest(model, provider, connectionId, true);

  // T5: track which models we've tried for intra-family fallback
-  const triedModels = new Set<string>([model]);
-  let currentModel = model;
+  const triedModels = new Set<string>([effectiveModel]);
+  let currentModel = effectiveModel;

  // Log start
  appendRequestLog({ model, provider, connectionId, status: "PENDING" }).catch(() => {});
@@ -325,9 +464,6 @@ export async function handleChatCore({
    0;
  log?.debug?.("REQUEST", `${provider.toUpperCase()} | ${model} | ${msgCount} msgs`);

-  // Create stream controller for disconnect detection
-  const streamController = createStreamController({ onDisconnect, log, provider, model });
-
  // Execute request using executor (handles URL building, headers, fallback, transform)
  let providerResponse;
  let providerUrl;
@@ -335,17 +471,7 @@ export async function handleChatCore({
  let finalBody;

  try {
-    const result = await withRateLimit(provider, connectionId, model, () =>
-      executor.execute({
-        model,
-        body: translatedBody,
-        stream,
-        credentials,
-        signal: streamController.signal,
-        log,
-        extendedContext,
-      })
-    );
+    const result = await executeProviderRequest(effectiveModel, true);

    providerResponse = result.response;
    providerUrl = result.url;
@@ -392,6 +518,7 @@ export async function handleChatCore({
      streamController.handleError(error);
      return createErrorResult(499, "Request aborted");
    }
+    persistFailureUsage(HTTP_STATUS.BAD_GATEWAY, error?.name || "upstream_error");
    const errMsg = formatProviderError(error, provider, model, HTTP_STATUS.BAD_GATEWAY);
    console.log(`${COLORS.red}[ERROR] ${errMsg}${COLORS.reset}`);
    return createErrorResult(HTTP_STATUS.BAD_GATEWAY, errMsg);
@@ -428,7 +555,7 @@ export async function handleChatCore({
          model,
          body: translatedBody,
          stream,
-          credentials,
+          credentials: getExecutionCredentials(),
          signal: streamController.signal,
          log,
          extendedContext,
@@ -501,17 +628,7 @@ export async function handleChatCore({
        log?.info?.("MODEL_FALLBACK", `${model} unavailable (${statusCode}) → trying ${nextModel}`);
        // Re-execute with the fallback model
        try {
-          const fallbackResult = await withRateLimit(provider, connectionId, nextModel, () =>
-            executor.execute({
-              model: nextModel,
-              body: translatedBody,
-              stream,
-              credentials,
-              signal: streamController.signal,
-              log,
-              extendedContext,
-            })
-          );
+          const fallbackResult = await executeProviderRequest(nextModel, false);
          if (fallbackResult.response.ok) {
            providerResponse = fallbackResult.response;
            providerUrl = fallbackResult.url;
@@ -523,18 +640,79 @@ export async function handleChatCore({
            // We fall through by NOT returning here
          } else {
            // Fallback also failed — return original error
+            persistFailureUsage(statusCode, "model_unavailable");
            return createErrorResult(statusCode, errMsg, retryAfterMs);
          }
        } catch {
+          persistFailureUsage(statusCode, "model_unavailable");
          return createErrorResult(statusCode, errMsg, retryAfterMs);
        }
      } else {
+        persistFailureUsage(statusCode, "model_unavailable");
        return createErrorResult(statusCode, errMsg, retryAfterMs);
      }
    } else {
+      persistFailureUsage(statusCode, `upstream_${statusCode}`);
      return createErrorResult(statusCode, errMsg, retryAfterMs);
    }
    // ── End T5 ───────────────────────────────────────────────────────────────
+
+    // ── Emergency Fallback (ClawRouter Feature #09/017) ────────────────────
+    // When a non-streaming request fails with a budget-related error (402 or
+    // budget keywords), redirect to nvidia/gpt-oss-120b ($0.00/M) before
+    // returning the error to the combo router. This gives one last free-tier
+    // attempt so the user's session stays alive.
+    const requestHasTools = Array.isArray(translatedBody.tools) && translatedBody.tools.length > 0;
+    if (!stream) {
+      const fbDecision = shouldUseFallback(
+        statusCode,
+        message,
+        requestHasTools,
+        EMERGENCY_FALLBACK_CONFIG
+      );
+      if (isFallbackDecision(fbDecision)) {
+        log?.info?.("EMERGENCY_FALLBACK", fbDecision.reason);
+        try {
+          // Build a minimal fallback request using the original body but with
+          // the NVIDIA free-tier model and max_tokens capped to avoid overuse.
+          const fbExecutor = getExecutor(fbDecision.provider);
+          const fbResult = await fbExecutor.execute({
+            model: fbDecision.model,
+            body: {
+              ...translatedBody,
+              model: fbDecision.model,
+              max_tokens: Math.min(
+                typeof translatedBody.max_tokens === "number"
+                  ? translatedBody.max_tokens
+                  : fbDecision.maxOutputTokens,
+                fbDecision.maxOutputTokens
+              ),
+            },
+            stream: false,
+            credentials: credentials,
+            signal: streamController.signal,
+            log,
+            extendedContext,
+          });
+          if (fbResult.response.ok) {
+            providerResponse = fbResult.response;
+            log?.info?.(
+              "EMERGENCY_FALLBACK",
+              `Serving ${fbDecision.provider}/${fbDecision.model} as budget fallback for ${provider}/${model}`
+            );
+            // Fall through to non-streaming handler — providerResponse is now OK
+          } else {
+            log?.warn?.(
+              "EMERGENCY_FALLBACK",
+              `Emergency fallback also failed (${fbResult.response.status})`
+            );
+          }
+        } catch (fbErr) {
+          log?.warn?.("EMERGENCY_FALLBACK", `Emergency fallback error: ${fbErr?.message}`);
+        }
+      }
+    }
+    // ── End Emergency Fallback ────────────────────────────────────────────
  }

  // Non-streaming response
@@ -560,6 +738,7 @@ export async function handleChatCore({
          connectionId,
          status: `FAILED ${HTTP_STATUS.BAD_GATEWAY}`,
        }).catch(() => {});
+        persistFailureUsage(HTTP_STATUS.BAD_GATEWAY, "invalid_sse_payload");
        return createErrorResult(
          HTTP_STATUS.BAD_GATEWAY,
          "Invalid SSE response for non-streaming request"
@@ -577,6 +756,7 @@ export async function handleChatCore({
          connectionId,
          status: `FAILED ${HTTP_STATUS.BAD_GATEWAY}`,
        }).catch(() => {});
+        persistFailureUsage(HTTP_STATUS.BAD_GATEWAY, "invalid_json_payload");
        return createErrorResult(HTTP_STATUS.BAD_GATEWAY, "Invalid JSON response from provider");
      }
    }
@@ -619,6 +799,11 @@ export async function handleChatCore({
        provider: provider || "unknown",
        model: model || "unknown",
        tokens: usage,
+        status: "200",
+        success: true,
+        latencyMs: Date.now() - startTime,
+        timeToFirstTokenMs: Date.now() - startTime,
+        errorCode: null,
        timestamp: new Date().toISOString(),
        connectionId: connectionId || undefined,
        apiKeyId: apiKeyInfo?.id || undefined,
@@ -695,8 +880,12 @@ export async function handleChatCore({
  // Create transform stream with logger for streaming response
  let transformStream;

-  // Callback to save call log when stream completes (streaming calls were never logged before!)
-  const onStreamComplete = ({ status: streamStatus, usage: streamUsage }) => {
+  // Callback to save call log when stream completes (include responseBody when provided by stream)
+  const onStreamComplete = ({
+    status: streamStatus,
+    usage: streamUsage,
+    responseBody: streamResponseBody,
+  }) => {
    saveCallLog({
      method: "POST",
      path: clientRawRequest?.endpoint || "/v1/chat/completions",
@@ -707,6 +896,7 @@ export async function handleChatCore({
      duration: Date.now() - startTime,
      tokens: streamUsage || {},
      requestBody: body,
+      responseBody: streamResponseBody ?? undefined,
      sourceFormat,
      targetFormat,
      comboName,
@@ -13,18 +13,48 @@
 * }
 */

-import { getEmbeddingProvider, parseEmbeddingModel } from "../config/embeddingRegistry.ts";
+import {
+  getEmbeddingProvider,
+  parseEmbeddingModel,
+  type EmbeddingProvider,
+} from "../config/embeddingRegistry.ts";
 import { saveCallLog } from "@/lib/usageDb";

 /**
- * Handle embedding request
- * @param {object} options
- * @param {object} options.body - Request body
- * @param {object} options.credentials - Provider credentials { apiKey, accessToken }
- * @param {object} options.log - Logger
+ * Handle embedding request.
+ * Supports both hardcoded cloud providers and dynamic local provider_nodes.
+ * When resolvedProvider is passed, uses it directly (injection pattern from route handler).
+ * Falls back to hardcoded registry lookup for backward compatibility.
 */
-export async function handleEmbedding({ body, credentials, log }) {
-  const { provider, model } = parseEmbeddingModel(body.model);
+export async function handleEmbedding({
+  body,
+  credentials,
+  log,
+  resolvedProvider = null,
+  resolvedModel = null,
+}: {
+  body: Record<string, unknown>;
+  credentials: { apiKey?: string; accessToken?: string } | null;
+  log?: { info: (...args: unknown[]) => void; error: (...args: unknown[]) => void };
+  resolvedProvider?: EmbeddingProvider | null;
+  resolvedModel?: string | null;
+}) {
+  // Use pre-resolved provider/model from route handler if available (supports dynamic provider_nodes).
+  let provider: string | null;
+  let model: string | null;
+  let providerConfig: EmbeddingProvider | null;
+
+  if (resolvedProvider) {
+    provider = resolvedProvider.id;
+    model = resolvedModel;
+    providerConfig = resolvedProvider;
+  } else {
+    const parsed = parseEmbeddingModel(body.model as string);
+    provider = parsed.provider;
+    model = parsed.model;
+    providerConfig = provider ? getEmbeddingProvider(provider) : null;
+  }
+
  const startTime = Date.now();

  // Summarized request body for call log (avoid storing large embedding input arrays)
@@ -42,7 +72,6 @@ export async function handleEmbedding({ body, credentials, log }) {
    };
  }

-  const providerConfig = getEmbeddingProvider(provider);
  if (!providerConfig) {
    return {
      success: false,
@@ -66,11 +95,15 @@ export async function handleEmbedding({ body, credentials, log }) {
    "Content-Type": "application/json",
  };

-  const token = credentials.apiKey || credentials.accessToken;
-  if (providerConfig.authHeader === "bearer") {
-    headers["Authorization"] = `Bearer ${token}`;
-  } else if (providerConfig.authHeader === "x-api-key") {
-    headers["x-api-key"] = token;
+  // Skip credential injection for local providers (authType: "none")
+  const token =
+    providerConfig.authType === "none" ? null : credentials?.apiKey || credentials?.accessToken;
+  if (token) {
+    if (providerConfig.authHeader === "bearer") {
+      headers["Authorization"] = `Bearer ${token}`;
+    } else if (providerConfig.authHeader === "x-api-key") {
+      headers["x-api-key"] = token;
+    }
  }

  if (log) {
@@ -0,0 +1,680 @@
+/**
+ * Search Handler
+ *
+ * Handles POST /v1/search requests.
+ * Routes to 5 search providers with automatic failover:
+ *   serper-search, brave-search, perplexity-search, exa-search, tavily-search
+ *
+ * Request format:
+ * {
+ *   "query": "search query",
+ *   "provider": "serper-search" | "brave-search" | ... // optional, auto-selects cheapest
+ *   "max_results": 5,
+ *   "search_type": "web" | "news"
+ * }
+ */
+
+import { getSearchProvider, type SearchProviderConfig } from "../config/searchRegistry.ts";
+import { saveCallLog } from "@/lib/usageDb";
+
+// ── Types ────────────────────────────────────────────────────────────────
+
+export interface SearchResult {
+  title: string;
+  url: string;
+  display_url?: string;
+  snippet: string;
+  position: number;
+  score: number | null;
+  published_at: string | null;
+  favicon_url: string | null;
+  content: { format: string; text: string; length: number } | null;
+  metadata: {
+    author: string | null;
+    language: string | null;
+    source_type: string | null;
+    image_url: string | null;
+  } | null;
+  citation: {
+    provider: string;
+    retrieved_at: string;
+    rank: number;
+  };
+  provider_raw: Record<string, unknown> | null;
+}
+
+export interface SearchResponse {
+  provider: string;
+  query: string;
+  results: SearchResult[];
+  answer: { source: string; text: string | null; model: string | null } | null;
+  usage: { queries_used: number; search_cost_usd: number; llm_tokens?: number };
+  metrics: {
+    response_time_ms: number;
+    upstream_latency_ms: number;
+    gateway_latency_ms?: number;
+    total_results_available: number | null;
+  };
+  errors: Array<{ provider: string; code: string; message: string }>;
+}
+
+interface SearchHandlerResult {
+  success: boolean;
+  status?: number;
+  error?: string;
+  data?: SearchResponse;
+}
+
+interface SearchHandlerOptions {
+  query: string;
+  provider: string;
+  maxResults: number;
+  searchType: string;
+  country?: string;
+  language?: string;
+  timeRange?: string;
+  offset?: number;
+  domainFilter?: string[];
+  contentOptions?: {
+    snippet?: boolean;
+    full_page?: boolean;
+    format?: string;
+    max_characters?: number;
+  };
+  strictFilters?: boolean;
+  providerOptions?: Record<string, unknown>;
+  credentials: Record<string, any>;
+  alternateProvider?: string;
+  alternateCredentials?: Record<string, any> | null;
+  log?: any;
+}
+
+// ── Constants ────────────────────────────────────────────────────────────
+
+const GLOBAL_TIMEOUT_MS = 15_000;
+
+// Non-retriable HTTP status codes — fail immediately, don't try alternate
+const NON_RETRIABLE = new Set([400, 401, 403, 404]);
+
+// ── Input Sanitization ──────────────────────────────────────────────────
+
+// Control characters that should never appear in search queries
+const CONTROL_CHAR_RE = /[\x00-\x08\x0B\x0C\x0E-\x1F\x7F]/;
+
+function sanitizeQuery(query: string): { clean: string; error?: string } {
+  if (CONTROL_CHAR_RE.test(query)) {
+    return { clean: "", error: "Query contains invalid control characters" };
+  }
+  const clean = query.normalize("NFKC").trim().replace(/\s+/g, " ");
+  if (clean.length === 0) {
+    return { clean: "", error: "Query is empty after normalization" };
+  }
+  return { clean };
+}
+
+// ── Response Normalizers ────────────────────────────────────────────────
+
+function makeResult(
+  providerId: string,
+  item: {
+    title?: string;
+    url?: string;
+    snippet?: string;
+    score?: number;
+    published_at?: string;
+    favicon_url?: string;
+    author?: string;
+    source_type?: string;
+    image_url?: string;
+    full_text?: string;
+    text_format?: string;
+  },
+  idx: number,
+  now: string
+): SearchResult {
+  const url = item.url || "";
+  return {
+    title: item.title || "",
+    url,
+    display_url: url ? url.replace(/^https?:\/\/(www\.)?/, "").split("?")[0] : undefined,
+    snippet: item.snippet || "",
+    position: idx + 1,
+    score: typeof item.score === "number" ? Math.min(1, Math.max(0, item.score)) : null,
+    published_at: item.published_at || null,
+    favicon_url: item.favicon_url || null,
+    content: item.full_text
+      ? { format: item.text_format || "text", text: item.full_text, length: item.full_text.length }
+      : null,
+    metadata: {
+      author: item.author || null,
+      language: null,
+      source_type: item.source_type || null,
+      image_url: item.image_url || null,
+    },
+    citation: { provider: providerId, retrieved_at: now, rank: idx + 1 },
+    provider_raw: null,
+  };
+}
+
+function normalizeSerperResponse(
+  data: any,
+  _query: string,
+  searchType: string
+): { results: SearchResult[]; totalResults: number | null } {
+  const now = new Date().toISOString();
+  const items = searchType === "news" ? data.news : data.organic;
+  if (!Array.isArray(items)) return { results: [], totalResults: null };
+
+  const results = items.map((item: any, idx: number) =>
+    makeResult(
+      "serper-search",
+      {
+        title: item.title,
+        url: item.link,
+        snippet: item.snippet || item.description,
+        published_at: item.date,
+      },
+      idx,
+      now
+    )
+  );
+
+  return {
+    results,
+    totalResults:
+      typeof data.searchParameters?.totalResults === "number"
+        ? data.searchParameters.totalResults
+        : null,
+  };
+}
+
+function normalizeBraveResponse(
+  data: any,
+  _query: string,
+  searchType: string
+): { results: SearchResult[]; totalResults: number | null } {
+  const now = new Date().toISOString();
+  // Brave news endpoint returns { results: [...] } directly,
+  // while web endpoint returns { web: { results: [...] } }
+  const container = searchType === "news" ? data.news || data : data.web;
+  const items = container?.results;
+  if (!Array.isArray(items)) return { results: [], totalResults: null };
+
+  const results = items.map((item: any, idx: number) =>
+    makeResult(
+      "brave-search",
+      {
+        title: item.title,
+        url: item.url,
+        snippet: item.description,
+        published_at: item.page_age || item.age,
+        favicon_url: item.meta_url?.favicon || item.favicon,
+      },
+      idx,
+      now
+    )
+  );
+
+  return { results, totalResults: container?.totalCount ?? null };
+}
+
+// ── Helpers ─────────────────────────────────────────────────────────────
+
+function parseDomainFilter(domainFilter?: string[]): {
+  includes: string[];
+  excludes: string[];
+} {
+  if (!domainFilter?.length) return { includes: [], excludes: [] };
+  const includes = domainFilter.filter((d) => !d.startsWith("-"));
+  const excludes = domainFilter.filter((d) => d.startsWith("-")).map((d) => d.slice(1));
+  return { includes, excludes };
+}
+
+// ── Provider Request Builders ───────────────────────────────────────────
+
+interface SearchRequestParams {
+  query: string;
+  searchType: string;
+  maxResults: number;
+  token: string;
+  country?: string;
+  language?: string;
+  domainFilter?: string[];
+}
+
+function buildSerperRequest(
+  config: SearchProviderConfig,
+  params: SearchRequestParams
+): { url: string; init: RequestInit } {
+  const endpoint = params.searchType === "news" ? "/news" : "/search";
+  const body: Record<string, unknown> = { q: params.query, num: params.maxResults };
+  if (params.country) body.gl = params.country.toLowerCase();
+  if (params.language) body.hl = params.language;
+  return {
+    url: `${config.baseUrl}${endpoint}`,
+    init: {
+      method: "POST",
+      headers: { "Content-Type": "application/json", "X-API-Key": params.token },
+      body: JSON.stringify(body),
+    },
+  };
+}
+
+function buildBraveRequest(
+  config: SearchProviderConfig,
+  params: SearchRequestParams
+): { url: string; init: RequestInit } {
+  const endpoint = params.searchType === "news" ? "/news/search" : "/web/search";
+  const qp = new URLSearchParams({ q: params.query, count: String(params.maxResults) });
+  if (params.country) qp.set("country", params.country);
+  if (params.language) qp.set("search_lang", params.language);
+  return {
+    url: `${config.baseUrl}${endpoint}?${qp}`,
+    init: {
+      method: "GET",
+      headers: { Accept: "application/json", "X-Subscription-Token": params.token },
+    },
+  };
+}
+
+function buildPerplexityRequest(
+  config: SearchProviderConfig,
+  params: SearchRequestParams
+): { url: string; init: RequestInit } {
+  const body: Record<string, unknown> = { query: params.query, max_results: params.maxResults };
+  if (params.country) body.country = params.country;
+  if (params.language) body.search_language_filter = [params.language];
+  if (params.domainFilter?.length) body.search_domain_filter = params.domainFilter;
+  return {
+    url: config.baseUrl,
+    init: {
+      method: "POST",
+      headers: { "Content-Type": "application/json", Authorization: `Bearer ${params.token}` },
+      body: JSON.stringify(body),
+    },
+  };
+}
+
+function buildExaRequest(
+  config: SearchProviderConfig,
+  params: SearchRequestParams
+): { url: string; init: RequestInit } {
+  const { includes, excludes } = parseDomainFilter(params.domainFilter);
+  const body: Record<string, unknown> = {
+    query: params.query,
+    numResults: params.maxResults,
+    type: "auto",
+    text: true,
+    highlights: true,
+  };
+  if (includes.length) body.includeDomains = includes;
+  if (excludes.length) body.excludeDomains = excludes;
+  if (params.searchType === "news") body.category = "news";
+  return {
+    url: config.baseUrl,
+    init: {
+      method: "POST",
+      headers: { "Content-Type": "application/json", "x-api-key": params.token },
+      body: JSON.stringify(body),
+    },
+  };
+}
+
+function buildTavilyRequest(
+  config: SearchProviderConfig,
+  params: SearchRequestParams
+): { url: string; init: RequestInit } {
+  const { includes, excludes } = parseDomainFilter(params.domainFilter);
+  const body: Record<string, unknown> = {
+    query: params.query,
+    max_results: params.maxResults,
+    topic: params.searchType === "news" ? "news" : "general",
+  };
+  if (includes.length) body.include_domains = includes;
+  if (excludes.length) body.exclude_domains = excludes;
+  if (params.country) body.country = params.country;
+  return {
+    url: config.baseUrl,
+    init: {
+      method: "POST",
+      headers: { "Content-Type": "application/json", Authorization: `Bearer ${params.token}` },
+      body: JSON.stringify(body),
+    },
+  };
+}
+
+function buildRequest(
+  config: SearchProviderConfig,
+  params: SearchRequestParams
+): { url: string; init: RequestInit } {
+  if (config.id === "serper-search") return buildSerperRequest(config, params);
+  if (config.id === "brave-search") return buildBraveRequest(config, params);
+  if (config.id === "perplexity-search") return buildPerplexityRequest(config, params);
+  if (config.id === "exa-search") return buildExaRequest(config, params);
+  if (config.id === "tavily-search") return buildTavilyRequest(config, params);
+  // Fallback for future providers: POST with bearer auth
+  return {
+    url: config.baseUrl,
+    init: {
+      method: config.method,
+      headers: { "Content-Type": "application/json", Authorization: `Bearer ${params.token}` },
+      body: JSON.stringify({
+        query: params.query,
+        max_results: params.maxResults,
+        search_type: params.searchType,
+      }),
+    },
+  };
+}
+
+function normalizePerplexityResponse(
+  data: any,
+  _query: string,
+  _searchType: string
+): { results: SearchResult[]; totalResults: number | null } {
+  const now = new Date().toISOString();
+  const items = data.results;
+  if (!Array.isArray(items)) return { results: [], totalResults: null };
+
+  const results = items.map((item: any, idx: number) =>
+    makeResult(
+      "perplexity-search",
+      {
+        title: item.title,
+        url: item.url,
+        snippet: item.snippet,
+        published_at: item.date || item.last_updated,
+      },
+      idx,
+      now
+    )
+  );
+  return { results, totalResults: results.length };
+}
+
+function normalizeExaResponse(
+  data: any,
+  _query: string,
+  _searchType: string
+): { results: SearchResult[]; totalResults: number | null } {
+  const now = new Date().toISOString();
+  const items = data.results;
+  if (!Array.isArray(items)) return { results: [], totalResults: null };
+
+  const results = items.map((item: any, idx: number) =>
+    makeResult(
+      "exa-search",
+      {
+        title: item.title,
+        url: item.url,
+        snippet: item.highlights?.[0] || item.text?.slice(0, 300) || "",
+        score: item.score,
+        published_at: item.publishedDate,
+        favicon_url: item.favicon,
+        author: item.author,
+        image_url: item.image,
+        full_text: item.text,
+        text_format: "text",
+      },
+      idx,
+      now
+    )
+  );
+  return { results, totalResults: results.length };
+}
+
+function normalizeTavilyResponse(
+  data: any,
+  _query: string,
+  _searchType: string
+): { results: SearchResult[]; totalResults: number | null } {
+  const now = new Date().toISOString();
+  const items = data.results;
+  if (!Array.isArray(items)) return { results: [], totalResults: null };
+
+  const results = items.map((item: any, idx: number) =>
+    makeResult(
+      "tavily-search",
+      {
+        title: item.title,
+        url: item.url,
+        snippet: item.content || "",
+        score: item.score,
+        published_at: item.published_date,
+        full_text: item.raw_content,
+        text_format: "text",
+      },
+      idx,
+      now
+    )
+  );
+  return { results, totalResults: results.length };
+}
+
+function normalizeResponse(
+  providerId: string,
+  data: any,
+  query: string,
+  searchType: string
+): { results: SearchResult[]; totalResults: number | null } {
+  if (providerId === "serper-search") return normalizeSerperResponse(data, query, searchType);
+  if (providerId === "brave-search") return normalizeBraveResponse(data, query, searchType);
+  if (providerId === "perplexity-search")
+    return normalizePerplexityResponse(data, query, searchType);
+  if (providerId === "exa-search") return normalizeExaResponse(data, query, searchType);
+  if (providerId === "tavily-search") return normalizeTavilyResponse(data, query, searchType);
+  return { results: [], totalResults: null };
+}
+
+// ── Main Handler ────────────────────────────────────────────────────────
+
+export async function handleSearch(options: SearchHandlerOptions): Promise<SearchHandlerResult> {
+  const {
+    query,
+    provider: providerId,
+    maxResults,
+    searchType,
+    country,
+    language,
+    domainFilter,
+    credentials,
+    alternateProvider,
+    alternateCredentials,
+    log,
+  } = options;
+  const startTime = Date.now();
+
+  // 1. Sanitize input
+  const { clean: cleanQuery, error: sanitizeError } = sanitizeQuery(query);
+  if (sanitizeError) {
+    return { success: false, status: 400, error: sanitizeError };
+  }
+
+  // 2. Use resolved provider from route (no re-resolution)
+  const primaryConfig = getSearchProvider(providerId);
+  if (!primaryConfig) {
+    return {
+      success: false,
+      status: 400,
+      error: `Unknown search provider: ${providerId}`,
+    };
+  }
+
+  // 3. Get alternate config for failover (pre-resolved by route)
+  const alternateConfig = alternateProvider ? getSearchProvider(alternateProvider) : null;
+
+  const requestParams = {
+    query: cleanQuery,
+    searchType,
+    maxResults,
+    country,
+    language,
+    domainFilter,
+  };
+
+  // 4. Try primary provider
+  const result = await tryProvider(primaryConfig, requestParams, credentials, startTime, log);
+
+  if (result.success) return result;
+
+  // 5. Failover to alternate (only for retriable errors and auto-select mode)
+  if (
+    alternateConfig &&
+    alternateCredentials &&
+    !NON_RETRIABLE.has(result.status || 0) &&
+    Date.now() - startTime < GLOBAL_TIMEOUT_MS
+  ) {
+    if (log) {
+      log.warn(
+        "SEARCH",
+        `${primaryConfig.id} failed (${result.status}), trying ${alternateConfig.id}`
+      );
+    }
+
+    const fallbackResult = await tryProvider(
+      alternateConfig,
+      requestParams,
+      alternateCredentials,
+      startTime,
+      log
+    );
+
+    if (fallbackResult.success) return fallbackResult;
+  }
+
+  return result;
+}
+
+async function tryProvider(
+  config: SearchProviderConfig,
+  params: Omit<SearchRequestParams, "token">,
+  credentials: Record<string, any>,
+  globalStartTime: number,
+  log?: any
+): Promise<SearchHandlerResult> {
+  const startTime = Date.now();
+  const token = credentials.apiKey || credentials.accessToken;
+
+  if (!token) {
+    return {
+      success: false,
+      status: 401,
+      error: `No credentials for search provider: ${config.id}`,
+    };
+  }
+
+  const { query, searchType, maxResults } = params;
+  const { url, init } = buildRequest(config, { ...params, token });
+
+  // Timeout: min of provider timeout and remaining global timeout
+  const remainingGlobal = GLOBAL_TIMEOUT_MS - (Date.now() - globalStartTime);
+  const timeout = Math.min(config.timeoutMs, Math.max(remainingGlobal, 1000));
+  const controller = new AbortController();
+  const timer = setTimeout(() => controller.abort(), timeout);
+
+  if (log) {
+    log.info("SEARCH", `${config.id} | query: "${query.slice(0, 80)}" | type: ${searchType}`);
+  }
+
+  try {
+    const response = await fetch(url, { ...init, signal: controller.signal });
+    clearTimeout(timer);
+
+    if (!response.ok) {
+      const errorText = await response.text();
+      if (log) {
+        log.error("SEARCH", `${config.id} error ${response.status}: ${errorText.slice(0, 200)}`);
+      }
+
+      saveCallLog({
+        method: config.method,
+        path: "/v1/search",
+        status: response.status,
+        model: config.id,
+        provider: config.id,
+        duration: Date.now() - startTime,
+        requestType: "search",
+        error: errorText.slice(0, 500),
+        requestBody: {
+          query: query.slice(0, 200),
+          search_type: searchType,
+          max_results: maxResults,
+        },
+      }).catch(() => {
+        /* non-critical — logging must not block search response */
+      });
+
+      return {
+        success: false,
+        status: response.status,
+        error: `Search provider ${config.id} returned ${response.status}`,
+      };
+    }
+
+    const data = await response.json();
+    const normalized = normalizeResponse(config.id, data, query, searchType);
+    // Enforce max_results — some providers return more than requested
+    const results = normalized.results.slice(0, maxResults);
+    const totalResults = normalized.totalResults;
+    const duration = Date.now() - startTime;
+
+    saveCallLog({
+      method: config.method,
+      path: "/v1/search",
+      status: 200,
+      model: config.id,
+      provider: config.id,
+      duration,
+      requestType: "search",
+      tokens: { prompt_tokens: 0, completion_tokens: 0 },
+      requestBody: { query: query.slice(0, 200), search_type: searchType, max_results: maxResults },
+      responseBody: { results_count: results.length, cached: false },
+    }).catch(() => {
+      /* non-critical — logging must not block search response */
+    });
+
+    return {
+      success: true,
+      data: {
+        provider: config.id,
+        query,
+        results,
+        answer: null,
+        usage: { queries_used: 1, search_cost_usd: config.costPerQuery },
+        metrics: {
+          response_time_ms: duration,
+          upstream_latency_ms: duration,
+          total_results_available: totalResults,
+        },
+        errors: [],
+      },
+    };
+  } catch (err: any) {
+    clearTimeout(timer);
+
+    const isTimeout = err.name === "AbortError";
+    if (log) {
+      log.error("SEARCH", `${config.id} ${isTimeout ? "timeout" : "fetch error"}: ${err.message}`);
+    }
+
+    saveCallLog({
+      method: config.method,
+      path: "/v1/search",
+      status: isTimeout ? 504 : 502,
+      model: config.id,
+      provider: config.id,
+      duration: Date.now() - startTime,
+      requestType: "search",
+      error: err.message,
+      requestBody: { query: query.slice(0, 200), search_type: searchType, max_results: maxResults },
+    }).catch(() => {
+      /* non-critical — logging must not block search response */
+    });
+
+    return {
+      success: false,
+      status: isTimeout ? 504 : 502,
+      error: `Search provider ${isTimeout ? "timeout" : "error"}: ${err.message}`,
+    };
+  }
+}
@@ -0,0 +1,48 @@
+import { describe, it, expect } from "vitest";
+import {
+  MCP_TOOLS,
+  MCP_TOOL_MAP,
+  setRoutingStrategyInput,
+  setRoutingStrategyTool,
+} from "../schemas/tools.ts";
+
+describe("omniroute_set_routing_strategy MCP tool schema", () => {
+  it("should be registered in MCP_TOOLS", () => {
+    const tool = MCP_TOOLS.find((t) => t.name === "omniroute_set_routing_strategy");
+    expect(tool).toBeDefined();
+    expect(tool?.phase).toBe(2);
+  });
+
+  it("should be available in MCP_TOOL_MAP", () => {
+    expect(MCP_TOOL_MAP["omniroute_set_routing_strategy"]).toBeDefined();
+  });
+
+  it("should require write:combos scope", () => {
+    expect(setRoutingStrategyTool.scopes).toContain("write:combos");
+  });
+
+  it("should validate a standard strategy payload", () => {
+    const result = setRoutingStrategyInput.safeParse({
+      comboId: "my-combo",
+      strategy: "cost-optimized",
+    });
+    expect(result.success).toBe(true);
+  });
+
+  it("should validate auto strategy with autoRoutingStrategy", () => {
+    const result = setRoutingStrategyInput.safeParse({
+      comboId: "my-combo",
+      strategy: "auto",
+      autoRoutingStrategy: "latency",
+    });
+    expect(result.success).toBe(true);
+  });
+
+  it("should reject unknown strategy", () => {
+    const result = setRoutingStrategyInput.safeParse({
+      comboId: "my-combo",
+      strategy: "unknown-strategy",
+    });
+    expect(result.success).toBe(false);
+  });
+});
@@ -107,6 +107,7 @@ export const listCombosOutput = z.object({
        "priority",
        "weighted",
        "round-robin",
+        "strict-random",
        "random",
        "least-used",
        "cost-optimized",
@@ -470,7 +471,53 @@ export const setBudgetGuardTool: McpToolDefinition<
  sourceEndpoints: ["/api/usage/budget"],
 };

-// --- Tool 11: omniroute_set_resilience_profile ---
+// --- Tool 11: omniroute_set_routing_strategy ---
+export const setRoutingStrategyInput = z.object({
+  comboId: z.string().describe("Combo ID or name to update"),
+  strategy: z
+    .enum([
+      "priority",
+      "weighted",
+      "round-robin",
+      "strict-random",
+      "random",
+      "least-used",
+      "cost-optimized",
+      "auto",
+    ])
+    .describe("Routing strategy to apply"),
+  autoRoutingStrategy: z
+    .enum(["rules", "cost", "eco", "latency", "fast"])
+    .optional()
+    .describe("Optional strategy used by auto mode (only used when strategy='auto')"),
+});
+
+export const setRoutingStrategyOutput = z.object({
+  success: z.boolean(),
+  combo: z.object({
+    id: z.string(),
+    name: z.string(),
+    strategy: z.string(),
+    autoRoutingStrategy: z.string().nullable(),
+  }),
+});
+
+export const setRoutingStrategyTool: McpToolDefinition<
+  typeof setRoutingStrategyInput,
+  typeof setRoutingStrategyOutput
+> = {
+  name: "omniroute_set_routing_strategy",
+  description:
+    "Updates a combo routing strategy (priority/weighted/auto/etc.) at runtime. Supports selecting the sub-strategy used by auto mode (rules/cost/latency).",
+  inputSchema: setRoutingStrategyInput,
+  outputSchema: setRoutingStrategyOutput,
+  scopes: ["write:combos"],
+  auditLevel: "full",
+  phase: 2,
+  sourceEndpoints: ["/api/combos", "/api/combos/{id}"],
+};
+
+// --- Tool 12: omniroute_set_resilience_profile ---
 export const setResilienceProfileInput = z.object({
  profile: z
    .enum(["aggressive", "balanced", "conservative"])
@@ -502,7 +549,7 @@ export const setResilienceProfileTool: McpToolDefinition<
  sourceEndpoints: ["/api/resilience"],
 };

-// --- Tool 12: omniroute_test_combo ---
+// --- Tool 13: omniroute_test_combo ---
 export const testComboInput = z.object({
  comboId: z.string().describe("ID of the combo to test"),
  testPrompt: z.string().max(500).describe("Short test prompt (max 500 chars)"),
@@ -540,7 +587,7 @@ export const testComboTool: McpToolDefinition<typeof testComboInput, typeof test
  sourceEndpoints: ["/api/combos/test", "/v1/chat/completions"],
 };

-// --- Tool 13: omniroute_get_provider_metrics ---
+// --- Tool 14: omniroute_get_provider_metrics ---
 export const getProviderMetricsInput = z.object({
  provider: z.string().describe("Provider name (e.g., 'claude', 'gemini-cli', 'codex')"),
 });
@@ -583,7 +630,7 @@ export const getProviderMetricsTool: McpToolDefinition<
  sourceEndpoints: ["/api/provider-metrics", "/api/resilience"],
 };

-// --- Tool 14: omniroute_best_combo_for_task ---
+// --- Tool 15: omniroute_best_combo_for_task ---
 export const bestComboForTaskInput = z.object({
  taskType: z
    .enum(["coding", "review", "planning", "analysis", "debugging", "documentation"])
@@ -628,7 +675,7 @@ export const bestComboForTaskTool: McpToolDefinition<
  sourceEndpoints: ["/api/combos", "/api/combos/metrics", "/api/monitoring/health"],
 };

-// --- Tool 15: omniroute_explain_route ---
+// --- Tool 16: omniroute_explain_route ---
 export const explainRouteInput = z.object({
  requestId: z.string().describe("Request ID from the X-Request-Id header"),
 });
@@ -674,7 +721,7 @@ export const explainRouteTool: McpToolDefinition<
  sourceEndpoints: [],
 };

-// --- Tool 16: omniroute_get_session_snapshot ---
+// --- Tool 17: omniroute_get_session_snapshot ---
 export const getSessionSnapshotInput = z.object({}).describe("No parameters required");

 export const getSessionSnapshotOutput = z.object({
@@ -723,7 +770,7 @@ export const getSessionSnapshotTool: McpToolDefinition<
  sourceEndpoints: ["/api/usage/analytics", "/api/telemetry/summary"],
 };

-// --- Tool 17: omniroute_sync_pricing ---
+// --- Tool 18: omniroute_sync_pricing ---
 export const syncPricingInput = z.object({
  sources: z
    .array(z.string())
@@ -775,6 +822,7 @@ export const MCP_TOOLS = [
  // Phase 2: Advanced
  simulateRouteTool,
  setBudgetGuardTool,
+  setRoutingStrategyTool,
  setResilienceProfileTool,
  testComboTool,
  getProviderMetricsTool,
@@ -25,6 +25,7 @@ import {
  listModelsCatalogInput,
  simulateRouteInput,
  setBudgetGuardInput,
+  setRoutingStrategyInput,
  setResilienceProfileInput,
  testComboInput,
  getProviderMetricsInput,
@@ -45,6 +46,7 @@ import {
 import {
  handleSimulateRoute,
  handleSetBudgetGuard,
+  handleSetRoutingStrategy,
  handleSetResilienceProfile,
  handleTestCombo,
  handleGetProviderMetrics,
@@ -593,6 +595,18 @@ export function createMcpServer(): McpServer {
    )
  );

+  server.registerTool(
+    "omniroute_set_routing_strategy",
+    {
+      description:
+        "Updates combo routing strategy at runtime (priority/weighted/round-robin/auto/etc.)",
+      inputSchema: setRoutingStrategyInput,
+    },
+    withScopeEnforcement("omniroute_set_routing_strategy", (args) =>
+      handleSetRoutingStrategy(setRoutingStrategyInput.parse(args))
+    )
+  );
+
  server.registerTool(
    "omniroute_set_resilience_profile",
    {
@@ -1,16 +1,18 @@
 /**
- * OmniRoute MCP Advanced Tools — 8 intelligence tools that differentiate
+ * OmniRoute MCP Advanced Tools — 10 intelligence tools that differentiate
 * OmniRoute from all other AI gateways.
 *
 * Tools:
 *   1. omniroute_simulate_route     — Dry-run routing simulation
 *   2. omniroute_set_budget_guard   — Session budget with degrade/block/alert
- *   3. omniroute_set_resilience_profile — Circuit breaker/retry profiles
- *   4. omniroute_test_combo         — Live test each provider in a combo
- *   5. omniroute_get_provider_metrics — Detailed per-provider metrics
- *   6. omniroute_best_combo_for_task — AI-powered combo recommendation
- *   7. omniroute_explain_route      — Post-hoc routing decision explainer
- *   8. omniroute_get_session_snapshot — Full session state snapshot
+ *   3. omniroute_set_routing_strategy — Runtime strategy switch for combos
+ *   4. omniroute_set_resilience_profile — Circuit breaker/retry profiles
+ *   5. omniroute_test_combo         — Live test each provider in a combo
+ *   6. omniroute_get_provider_metrics — Detailed per-provider metrics
+ *   7. omniroute_best_combo_for_task — AI-powered combo recommendation
+ *   8. omniroute_explain_route      — Post-hoc routing decision explainer
+ *   9. omniroute_get_session_snapshot — Full session state snapshot
+ *  10. omniroute_sync_pricing      — Sync provider pricing from external source
 */

 import { logToolCall } from "../audit.ts";
@@ -335,6 +337,108 @@ export async function handleSetBudgetGuard(args: {
  }
 }

+export async function handleSetRoutingStrategy(args: {
+  comboId: string;
+  strategy:
+    | "priority"
+    | "weighted"
+    | "round-robin"
+    | "strict-random"
+    | "random"
+    | "least-used"
+    | "cost-optimized"
+    | "auto";
+  autoRoutingStrategy?: "rules" | "cost" | "eco" | "latency" | "fast";
+}) {
+  const start = Date.now();
+  try {
+    const combos = normalizeCombosResponse(await apiFetch("/api/combos"));
+    const combo = combos.find(
+      (comboEntry) =>
+        toString(comboEntry.id) === args.comboId || toString(comboEntry.name) === args.comboId
+    );
+
+    if (!combo) {
+      const msg = `Combo '${args.comboId}' not found`;
+      await logToolCall(
+        "omniroute_set_routing_strategy",
+        args,
+        null,
+        Date.now() - start,
+        false,
+        msg
+      );
+      return { content: [{ type: "text" as const, text: `Error: ${msg}` }], isError: true };
+    }
+
+    const comboId = toString(combo.id);
+    if (!comboId) {
+      const msg = "Matched combo has no id";
+      await logToolCall(
+        "omniroute_set_routing_strategy",
+        args,
+        null,
+        Date.now() - start,
+        false,
+        msg
+      );
+      return { content: [{ type: "text" as const, text: `Error: ${msg}` }], isError: true };
+    }
+
+    const comboData = toRecord(combo.data);
+    const currentConfig = toRecord(
+      Object.keys(toRecord(combo.config)).length > 0 ? combo.config : comboData.config
+    );
+
+    let nextConfig: JsonRecord | undefined = undefined;
+    if (args.strategy === "auto" && args.autoRoutingStrategy) {
+      const currentAutoConfig = toRecord(currentConfig.auto);
+      nextConfig = {
+        ...currentConfig,
+        auto: {
+          ...currentAutoConfig,
+          routingStrategy: args.autoRoutingStrategy,
+        },
+      };
+    }
+
+    const payload: JsonRecord = { strategy: args.strategy };
+    if (nextConfig && Object.keys(nextConfig).length > 0) {
+      payload.config = nextConfig;
+    }
+
+    const updatedCombo = toRecord(
+      await apiFetch(`/api/combos/${encodeURIComponent(comboId)}`, {
+        method: "PUT",
+        body: JSON.stringify(payload),
+      })
+    );
+
+    const updatedConfig = toRecord(updatedCombo.config);
+    const resolvedAutoStrategy =
+      toString(toRecord(updatedConfig.auto).routingStrategy) ||
+      (args.strategy === "auto" ? (args.autoRoutingStrategy ?? "rules") : "");
+
+    const result = {
+      success: true,
+      combo: {
+        id: toString(updatedCombo.id, comboId),
+        name: toString(updatedCombo.name, toString(combo.name, comboId)),
+        strategy: toString(updatedCombo.strategy, args.strategy),
+        autoRoutingStrategy:
+          toString(updatedCombo.strategy, args.strategy) === "auto" ? resolvedAutoStrategy : null,
+      },
+    };
+
+    await logToolCall("omniroute_set_routing_strategy", args, result, Date.now() - start, true);
+    return { content: [{ type: "text" as const, text: JSON.stringify(result, null, 2) }] };
+  } catch (err) {
+    const msg = err instanceof Error ? err.message : String(err);
+    await logToolCall("omniroute_set_routing_strategy", args, null, Date.now() - start, false, msg);
+    return { content: [{ type: "text" as const, text: `Error: ${msg}` }], isError: true };
+  }
+}
+
 export async function handleSetResilienceProfile(args: {
  profile: "aggressive" | "balanced" | "conservative";
 }) {
@@ -20,6 +20,7 @@ import {
 import { getTaskFitness } from "./taskFitness";
 import { getModePack } from "./modePacks";
 import { getSelfHealingManager } from "./selfHealing";
+import { classifyPromptIntent } from "../intentClassifier";

 export interface AutoComboConfig {
  id: string;
@@ -30,6 +31,8 @@ export interface AutoComboConfig {
  modePack?: string;
  budgetCap?: number; // max cost per request in USD
  explorationRate: number; // 0.05 = 5% exploratory
+  /** If set, RouterStrategy name to use for selection ('rules' | 'cost' | 'latency') */
+  routerStrategy?: string;
 }

 export interface SelectionResult {
@@ -43,14 +46,44 @@ export interface SelectionResult {

 /**
 * Select the best provider from an auto-combo pool.
+ *
+ * @param config - AutoCombo configuration
+ * @param candidates - Provider candidates to score
+ * @param taskType - Task type hint. When "default" or omitted, the engine will attempt
+ *   to infer the intent from `promptMessages` using multilingual classification.
+ * @param promptMessages - Optional raw messages for intent classification
 */
 export function selectProvider(
  config: AutoComboConfig,
  candidates: ProviderCandidate[],
-  taskType: string = "default"
+  taskType: string = "default",
+  promptMessages?: Array<{ role: string; content: unknown }>
 ): SelectionResult {
  const healer = getSelfHealingManager();

+  // ── Intent classification (ClawRouter Feature #10/11) ────────────────────
+  // When taskType is generic ('default'), attempt to classify the prompt intent
+  // using the multilingual intentClassifier for better task fitness scoring.
+  let effectiveTaskType = taskType;
+  if ((taskType === "default" || taskType === "") && promptMessages?.length) {
+    // Extract text from last user message for classification
+    const lastUserMsg = [...promptMessages].reverse().find((m) => m.role === "user");
+    if (lastUserMsg) {
+      const text =
+        typeof lastUserMsg.content === "string"
+          ? lastUserMsg.content
+          : Array.isArray(lastUserMsg.content)
+            ? (lastUserMsg.content as Array<{ type: string; text?: string }>)
+                .filter((b) => b.type === "text")
+                .map((b) => b.text || "")
+                .join(" ")
+            : "";
+      if (text.length > 10) {
+        const intent = classifyPromptIntent(text);
+        effectiveTaskType = intent; // 'code' | 'reasoning' | 'simple' | 'medium'
+      }
+    }
+  }
  // Resolve weights from mode pack or config
  let weights = config.weights;
  if (config.modePack) {
@@ -80,8 +113,8 @@ export function selectProvider(
    excluded.length = 0;
  }

-  // Score all providers
-  const scored = scorePool(pool, taskType, weights, getTaskFitness);
+  // Score all providers (using classified intent if available)
+  const scored = scorePool(pool, effectiveTaskType, weights, getTaskFitness);

  // Apply self-healing re-evaluation with actual scores
  const finalCandidates = scored.filter((s) => {
@@ -0,0 +1,159 @@
+/**
+ * RouterStrategy — Pluggable Routing Strategy System
+ *
+ * Inspired by ClawRouter commit 14c83c258 "refactor: extract routing into pluggable RouterStrategy system".
+ * Provides a RouterStrategy interface and two built-in implementations:
+ *   - RulesStrategy (default): wraps the existing 6-factor scoring engine
+ *   - CostStrategy: always picks cheapest available model
+ */
+
+import type { ProviderCandidate, ScoredProvider } from "./scoring.ts";
+import { scorePool } from "./scoring.ts";
+import { getTaskFitness } from "./taskFitness.ts";
+
+export interface RoutingContext {
+  taskType: string;
+  requestHasTools?: boolean;
+  requestHasVision?: boolean;
+  estimatedInputTokens?: number;
+}
+
+export interface RoutingDecision {
+  provider: string;
+  model: string;
+  strategy: string;
+  reason: string;
+  candidatesConsidered: number;
+  finalScore: number;
+}
+
+export interface RouterStrategy {
+  readonly name: string;
+  readonly description: string;
+  select(pool: ProviderCandidate[], context: RoutingContext): RoutingDecision;
+}
+
+// ── RulesStrategy: wraps 6-factor scoring engine ────────────────────────────
+
+class RulesStrategyImpl implements RouterStrategy {
+  readonly name = "rules";
+  readonly description =
+    "6-factor weighted scoring: quota, health, cost, latency, taskFit, stability";
+
+  select(pool: ProviderCandidate[], context: RoutingContext): RoutingDecision {
+    const eligible = pool.filter((c) => c.circuitBreakerState !== "OPEN");
+    const ranked: ScoredProvider[] = scorePool(
+      eligible.length > 0 ? eligible : pool,
+      context.taskType,
+      undefined,
+      getTaskFitness
+    );
+    const best = ranked[0];
+    if (!best) throw new Error("[RulesStrategy] No candidates to score");
+    return {
+      provider: best.provider,
+      model: best.model,
+      strategy: this.name,
+      reason: `RulesStrategy: score=${best.score.toFixed(3)} (quota=${best.factors.quota.toFixed(2)}, health=${best.factors.health.toFixed(2)}, cost=${best.factors.costInv.toFixed(2)}, taskFit=${best.factors.taskFit.toFixed(2)})`,
+      candidatesConsidered: ranked.length,
+      finalScore: best.score,
+    };
+  }
+}
+
+// ── CostStrategy: always picks cheapest healthy provider ─────────────────────
+
+class CostStrategyImpl implements RouterStrategy {
+  readonly name = "cost";
+  readonly description = "Always selects cheapest available provider (by costPer1MTokens)";
+
+  select(pool: ProviderCandidate[], context: RoutingContext): RoutingDecision {
+    const healthy = pool.filter((c) => c.circuitBreakerState !== "OPEN");
+    const candidates = healthy.length > 0 ? healthy : pool;
+    const sorted = [...candidates].sort((a, b) => a.costPer1MTokens - b.costPer1MTokens);
+    const best = sorted[0];
+    if (!best) throw new Error("[CostStrategy] No candidates available");
+    return {
+      provider: best.provider,
+      model: best.model,
+      strategy: this.name,
+      reason: `CostStrategy: cheapest at $${best.costPer1MTokens.toFixed(3)}/1M tokens`,
+      candidatesConsidered: candidates.length,
+      finalScore: best.costPer1MTokens === 0 ? 1.0 : 1 / best.costPer1MTokens,
+    };
+  }
+}
+
+// ── LatencyStrategy: prioritize low latency + reliability ───────────────────
+
+class LatencyStrategyImpl implements RouterStrategy {
+  readonly name = "latency";
+  readonly description = "Prioritizes lowest p95 latency with reliability weighting";
+
+  select(pool: ProviderCandidate[], context: RoutingContext): RoutingDecision {
+    const healthy = pool.filter((c) => c.circuitBreakerState !== "OPEN");
+    const candidates = healthy.length > 0 ? healthy : pool;
+    const sorted = [...candidates].sort((a, b) => {
+      const aPenalty = a.errorRate * 1000;
+      const bPenalty = b.errorRate * 1000;
+      return a.p95LatencyMs + aPenalty - (b.p95LatencyMs + bPenalty);
+    });
+    const best = sorted[0];
+    if (!best) throw new Error("[LatencyStrategy] No candidates available");
+
+    const latencyScore = best.p95LatencyMs > 0 ? Math.max(0.001, 10_000 / best.p95LatencyMs) : 1;
+    const reliability = Math.max(0, 1 - best.errorRate);
+    const finalScore = latencyScore * 0.7 + reliability * 0.3;
+
+    return {
+      provider: best.provider,
+      model: best.model,
+      strategy: this.name,
+      reason: `LatencyStrategy: p95=${best.p95LatencyMs}ms, errorRate=${(best.errorRate * 100).toFixed(2)}%`,
+      candidatesConsidered: candidates.length,
+      finalScore,
+    };
+  }
+}
+
+// ── Registry ──────────────────────────────────────────────────────────────────
+
+const strategyRegistry = new Map<string, RouterStrategy>();
+
+const rulesStrategy = new RulesStrategyImpl();
+const costStrategy = new CostStrategyImpl();
+const latencyStrategy = new LatencyStrategyImpl();
+
+strategyRegistry.set("rules", rulesStrategy);
+strategyRegistry.set("cost", costStrategy);
+strategyRegistry.set("eco", costStrategy); // alias
+strategyRegistry.set("latency", latencyStrategy);
+strategyRegistry.set("fast", latencyStrategy); // alias
+
+export function getStrategy(name: string): RouterStrategy {
+  const strategy = strategyRegistry.get(name);
+  if (!strategy) {
+    console.warn(`[RouterStrategy] Strategy '${name}' not found, falling back to 'rules'`);
+    return rulesStrategy;
+  }
+  return strategy;
+}
+
+export function registerStrategy(name: string, strategy: RouterStrategy): void {
+  if (strategyRegistry.has(name)) {
+    console.warn(`[RouterStrategy] Overwriting strategy '${name}'`);
+  }
+  strategyRegistry.set(name, strategy);
+}
+
+export function listStrategies(): Array<{ name: string; description: string }> {
+  return [...strategyRegistry.entries()].map(([name, s]) => ({ name, description: s.description }));
+}
+
+export function selectWithStrategy(
+  pool: ProviderCandidate[],
+  context: RoutingContext,
+  strategyName = "rules"
+): RoutingDecision {
+  return getStrategy(strategyName).select(pool, context);
+}
@@ -74,7 +74,8 @@ export function calculateScore(factors: ScoringFactors, weights: ScoringWeights)
    weights.costInv * factors.costInv +
    weights.latencyInv * factors.latencyInv +
    weights.taskFit * factors.taskFit +
-    weights.stability * factors.stability
+    weights.stability * factors.stability +
+    weights.tierPriority * factors.tierPriority
  );
 }

@@ -24,10 +24,23 @@ const FITNESS_TABLE: Record<string, Record<string, number>> = {
    "deepseek-coder": 0.9,
    "deepseek-v3": 0.85,
    "deepseek-r1": 0.88,
+    "deepseek-chat": 0.84, // DeepSeek V3.2 Chat — strong code performance
+    "deepseek-v3.2": 0.86, // Explicit V3.2 alias
    qwen: 0.78,
    llama: 0.72,
    mistral: 0.75,
    mixtral: 0.77,
+    // Grok-4 fast — good code, ultra-low latency (1143ms P50)
+    "grok-4-fast": 0.8,
+    "grok-4": 0.82,
+    "grok-3": 0.8,
+    // Kimi K2.5 — agentic with tool calling, good at code tasks
+    "kimi-k2": 0.82,
+    // GLM-5 — Z.AI model with 128k output
+    "glm-5": 0.78,
+    // MiniMax M2.5 — reasoning support helps complex code
+    "minimax-m2.5": 0.75,
+    "minimax-m2": 0.72,
  },
  review: {
    "claude-sonnet": 0.92,
@@ -58,10 +71,15 @@ const FITNESS_TABLE: Record<string, Record<string, number>> = {
    "claude-sonnet": 0.92,
    "gemini-2.5-pro": 0.95,
    "gemini-pro": 0.88,
+    "gemini-3.1-pro": 0.95, // Gemini 3.1 Pro — 1M context, ideal for long analysis
    "gpt-4o": 0.85,
    o1: 0.9,
    o3: 0.93,
    "deepseek-r1": 0.88,
+    "deepseek-chat": 0.8,
+    "kimi-k2": 0.82, // Kimi K2.5 agentic — good for analysis
+    "glm-5": 0.78, // GLM-5 with 128k output for long analysis
+    "minimax-m2.5": 0.76,
  },
  debugging: {
    "claude-sonnet": 0.93,
@@ -87,8 +105,17 @@ const FITNESS_TABLE: Record<string, Record<string, number>> = {
    "claude-opus": 0.85,
    "gpt-4o": 0.85,
    "gemini-pro": 0.8,
+    "gemini-3.1-pro": 0.85,
    "deepseek-v3": 0.75,
+    "deepseek-chat": 0.74,
    "gemini-flash": 0.72,
+    // New models from ClawRouter analysis (2026-03-17):
+    "grok-4-fast": 0.72, // ultra-fast, suitable for all tasks
+    "grok-4": 0.74,
+    "grok-3": 0.73,
+    "kimi-k2": 0.76, // agentic multi-step tasks
+    "glm-5": 0.7,
+    "minimax-m2.5": 0.7,
  },
 };

@@ -5,18 +5,37 @@

 import { checkFallbackError, formatRetryAfter, getProviderProfile } from "./accountFallback.ts";
 import { unavailableResponse } from "../utils/error.ts";
-import { recordComboRequest, getComboMetrics } from "./comboMetrics.ts";
+import { recordComboIntent, recordComboRequest, getComboMetrics } from "./comboMetrics.ts";
 import { resolveComboConfig, getDefaultComboConfig } from "./comboConfig.ts";
 import * as semaphore from "./rateLimitSemaphore.ts";
 import { getCircuitBreaker } from "../../src/shared/utils/circuitBreaker";
 import { fisherYatesShuffle, getNextFromDeck } from "../../src/shared/utils/shuffleDeck";
 import { parseModel } from "./model.ts";
+import { applyComboAgentMiddleware, injectModelTag } from "./comboAgentMiddleware.ts";
+import { classifyWithConfig, DEFAULT_INTENT_CONFIG } from "./intentClassifier.ts";
+import { selectProvider as selectAutoProvider } from "./autoCombo/engine.ts";
+import { selectWithStrategy } from "./autoCombo/routerStrategy.ts";
+import { DEFAULT_WEIGHTS, scorePool } from "./autoCombo/scoring.ts";
+import { supportsToolCalling } from "./modelCapabilities.ts";

 // Status codes that should mark semaphore + record circuit breaker failures
 const TRANSIENT_FOR_BREAKER = [429, 502, 503, 504];

 const MAX_COMBO_DEPTH = 3;

+// Bootstrap defaults from ClawRouter benchmark (used when no local latency history exists yet)
+const DEFAULT_MODEL_P95_MS = {
+  "grok-4-fast-non-reasoning": 1143,
+  "grok-4-1-fast-non-reasoning": 1244,
+  "gemini-2.5-flash": 1238,
+  "kimi-k2.5": 1646,
+  "gpt-4o-mini": 2764,
+  "claude-sonnet-4.6": 4000,
+  "claude-opus-4.6": 6000,
+  "deepseek-chat": 2000,
+};
+const MIN_HISTORY_SAMPLES = 10;
+
 // In-memory atomic counter per combo for round-robin distribution
 // Resets on server restart (by design — no stale state)
 const rrCounters = new Map();
@@ -201,6 +220,193 @@ function sortModelsByUsage(models, comboName) {
  return withUsage.map((e) => e.modelStr);
 }

+function toTextContent(content) {
+  if (typeof content === "string") return content;
+  if (!Array.isArray(content)) return "";
+  return content
+    .map((part) => {
+      if (!part || typeof part !== "object") return "";
+      if (typeof part.text === "string") return part.text;
+      return "";
+    })
+    .join("\n");
+}
+
+function extractPromptForIntent(body) {
+  if (!body || typeof body !== "object") return "";
+
+  const fromMessages = Array.isArray(body.messages)
+    ? [...body.messages].reverse().find((m) => m && typeof m === "object" && m.role === "user")
+    : null;
+  if (fromMessages) return toTextContent(fromMessages.content);
+
+  if (typeof body.input === "string") return body.input;
+  if (Array.isArray(body.input)) {
+    const text = body.input
+      .map((item) => {
+        if (!item || typeof item !== "object") return "";
+        if (typeof item.content === "string") return item.content;
+        if (typeof item.text === "string") return item.text;
+        return "";
+      })
+      .filter(Boolean)
+      .join("\n");
+    if (text) return text;
+  }
+
+  if (typeof body.prompt === "string") return body.prompt;
+  return "";
+}
+
+function mapIntentToTaskType(intent) {
+  switch (intent) {
+    case "code":
+      return "coding";
+    case "reasoning":
+      return "analysis";
+    case "simple":
+      return "default";
+    case "medium":
+    default:
+      return "default";
+  }
+}
+
+function toStringArray(input) {
+  if (Array.isArray(input)) {
+    return input.map((v) => (typeof v === "string" ? v.trim() : "")).filter(Boolean);
+  }
+  if (typeof input === "string") {
+    return input
+      .split(",")
+      .map((v) => v.trim())
+      .filter(Boolean);
+  }
+  return [];
+}
+
+function getIntentConfig(settings, combo) {
+  const comboIntentConfig =
+    combo?.autoConfig?.intentConfig ||
+    combo?.config?.auto?.intentConfig ||
+    combo?.config?.intentConfig ||
+    {};
+
+  return {
+    ...DEFAULT_INTENT_CONFIG,
+    ...comboIntentConfig,
+    ...(typeof settings?.intentDetectionEnabled === "boolean"
+      ? { enabled: settings.intentDetectionEnabled }
+      : {}),
+    ...(Number.isFinite(Number(settings?.intentSimpleMaxWords))
+      ? { simpleMaxWords: Number(settings.intentSimpleMaxWords) }
+      : {}),
+    ...(toStringArray(settings?.intentExtraCodeKeywords).length > 0
+      ? { extraCodeKeywords: toStringArray(settings.intentExtraCodeKeywords) }
+      : {}),
+    ...(toStringArray(settings?.intentExtraReasoningKeywords).length > 0
+      ? { extraReasoningKeywords: toStringArray(settings.intentExtraReasoningKeywords) }
+      : {}),
+    ...(toStringArray(settings?.intentExtraSimpleKeywords).length > 0
+      ? { extraSimpleKeywords: toStringArray(settings.intentExtraSimpleKeywords) }
+      : {}),
+  };
+}
+
+function getBootstrapLatencyMs(modelId) {
+  const normalized = String(modelId || "").toLowerCase();
+  return DEFAULT_MODEL_P95_MS[normalized] ?? 1500;
+}
+
+async function buildAutoCandidates(modelStrings, comboName) {
+  const metrics = getComboMetrics(comboName);
+  const { getPricingForModel } = await import("../../src/lib/localDb");
+  let historicalLatencyStats = {};
+  try {
+    const { getModelLatencyStats } = await import("../../src/lib/usageDb");
+    historicalLatencyStats = await getModelLatencyStats({
+      windowHours: 24,
+      minSamples: 3,
+      maxRows: 10000,
+    });
+  } catch {
+    // keep empty stats — auto-combo will use runtime + bootstrap signals
+  }
+
+  const candidates = await Promise.all(
+    modelStrings.map(async (modelStr) => {
+      const parsed = parseModel(modelStr);
+      const provider = parsed.provider || parsed.providerAlias || "unknown";
+      const model = parsed.model || modelStr;
+      const historicalKey = `${provider}/${model}`;
+      const historicalModelMetric = historicalLatencyStats[historicalKey] || null;
+      const historicalTotal = Number(historicalModelMetric?.totalRequests);
+      const hasHistoricalSignal =
+        Number.isFinite(historicalTotal) && historicalTotal >= MIN_HISTORY_SAMPLES;
+
+      let costPer1MTokens = 1;
+      try {
+        const pricing = await getPricingForModel(provider, model);
+        const inputPrice = Number(pricing?.input);
+        if (Number.isFinite(inputPrice) && inputPrice >= 0) {
+          costPer1MTokens = inputPrice;
+        }
+      } catch {
+        // keep default cost
+      }
+
+      const modelMetric = metrics?.byModel?.[modelStr] || null;
+      const avgLatency = Number(modelMetric?.avgLatencyMs);
+      const successRate = Number(modelMetric?.successRate);
+      const historicalP95Latency = Number(historicalModelMetric?.p95LatencyMs);
+      const historicalStdDev = Number(historicalModelMetric?.latencyStdDev);
+      const historicalSuccessRate = Number(historicalModelMetric?.successRate); // 0..1
+
+      const p95LatencyMs = hasHistoricalSignal
+        ? Number.isFinite(historicalP95Latency) && historicalP95Latency > 0
+          ? historicalP95Latency
+          : getBootstrapLatencyMs(model)
+        : Number.isFinite(avgLatency) && avgLatency > 0
+          ? avgLatency
+          : getBootstrapLatencyMs(model);
+
+      const errorRate = hasHistoricalSignal
+        ? Number.isFinite(historicalSuccessRate) &&
+          historicalSuccessRate >= 0 &&
+          historicalSuccessRate <= 1
+          ? 1 - historicalSuccessRate
+          : 0.05
+        : Number.isFinite(successRate) && successRate >= 0 && successRate <= 100
+          ? 1 - successRate / 100
+          : 0.05;
+      const latencyStdDev =
+        hasHistoricalSignal && Number.isFinite(historicalStdDev) && historicalStdDev > 0
+          ? Math.max(10, historicalStdDev)
+          : Math.max(10, p95LatencyMs * 0.1);
+
+      const breakerStateRaw = getCircuitBreaker(`combo:${modelStr}`)?.getStatus?.()?.state;
+      const circuitBreakerState =
+        breakerStateRaw === "OPEN" || breakerStateRaw === "HALF_OPEN" ? breakerStateRaw : "CLOSED";
+
+      return {
+        provider,
+        model,
+        quotaRemaining: 100,
+        quotaTotal: 100,
+        circuitBreakerState,
+        costPer1MTokens,
+        p95LatencyMs,
+        latencyStdDev,
+        errorRate,
+        accountTier: "standard",
+        quotaResetIntervalSecs: 86400,
+      };
+    })
+  );
+
+  return candidates;
+}
+
 /**
 * Handle combo chat with fallback
 * Supports all 6 strategies: priority, weighted, round-robin, random, least-used, cost-optimized
@@ -225,12 +431,49 @@ export async function handleComboChat({
  const strategy = combo.strategy || "priority";
  const models = combo.models || [];

+  // ── Combo Agent Middleware (#399 + #401) ────────────────────────────────
+  // Apply system_message override, tool_filter_regex, and extract pinned model
+  // from context caching tag. These are all opt-in per combo config.
+  const { body: agentBody, pinnedModel } = applyComboAgentMiddleware(
+    body,
+    combo,
+    "" // provider/model not yet known — resolved per-model in loop
+  );
+  body = agentBody;
+  if (pinnedModel) {
+    log.info("COMBO", `[#401] Context caching: pinned model=${pinnedModel}`);
+  }
+  // Wrap handleSingleModel to inject context caching tag on response (#401)
+  const handleSingleModelWrapped = combo.context_cache_protection
+    ? async (b, modelStr) => {
+        const res = await handleSingleModel(b, modelStr);
+        // Inject tag only on success and only for non-streaming non-binary responses
+        if (res.ok && !b.stream) {
+          try {
+            const json = await res.clone().json();
+            const msgs = Array.isArray(json?.messages) ? json.messages : [];
+            if (msgs.length > 0) {
+              const tagged = injectModelTag(msgs, modelStr);
+              return new Response(JSON.stringify({ ...json, messages: tagged }), {
+                status: res.status,
+                headers: res.headers,
+              });
+            }
+          } catch {
+            /* non-JSON or stream — skip tagging */
+          }
+        }
+        return res;
+      }
+    : handleSingleModel;
+  // ─────────────────────────────────────────────────────────────────────────
+
  // Route to round-robin handler if strategy matches
  if (strategy === "round-robin") {
    return handleRoundRobinCombo({
      body,
      combo,
-      handleSingleModel,
+      handleSingleModel: handleSingleModelWrapped,
      isModelAvailable,
      log,
      settings,
@@ -278,7 +521,131 @@ export async function handleComboChat({
  }

  // Apply strategy-specific ordering
-  if (strategy === "strict-random") {
+  if (strategy === "auto") {
+    const requestHasTools = Array.isArray(body?.tools) && body.tools.length > 0;
+    let eligibleModels = [...orderedModels];
+
+    if (requestHasTools) {
+      const filtered = eligibleModels.filter((m) => supportsToolCalling(m));
+      if (filtered.length > 0) {
+        eligibleModels = filtered;
+      } else {
+        log.warn(
+          "COMBO",
+          "Auto strategy: all candidates filtered by tool-calling policy, falling back to full pool"
+        );
+      }
+    }
+
+    const prompt = extractPromptForIntent(body);
+    const systemPrompt =
+      typeof combo?.system_message === "string" ? combo.system_message : undefined;
+    const intentConfig = getIntentConfig(settings, combo);
+    const intent = classifyWithConfig(prompt, intentConfig, systemPrompt);
+    recordComboIntent(combo.name, intent);
+    const taskType = mapIntentToTaskType(intent);
+
+    const autoConfigSource = combo?.autoConfig || combo?.config?.auto || combo?.config || {};
+    const routingStrategy =
+      typeof autoConfigSource.routingStrategy === "string"
+        ? autoConfigSource.routingStrategy
+        : typeof autoConfigSource.strategyName === "string"
+          ? autoConfigSource.strategyName
+          : "rules";
+
+    const candidatePool = Array.isArray(autoConfigSource.candidatePool)
+      ? autoConfigSource.candidatePool
+      : [
+          ...new Set(
+            eligibleModels.map((m) => {
+              const parsed = parseModel(m);
+              return parsed.provider || parsed.providerAlias || "unknown";
+            })
+          ),
+        ];
+
+    const weights =
+      autoConfigSource.weights && typeof autoConfigSource.weights === "object"
+        ? autoConfigSource.weights
+        : DEFAULT_WEIGHTS;
+    const explorationRate = Number.isFinite(Number(autoConfigSource.explorationRate))
+      ? Number(autoConfigSource.explorationRate)
+      : 0.05;
+    const budgetCap = Number.isFinite(Number(autoConfigSource.budgetCap))
+      ? Number(autoConfigSource.budgetCap)
+      : undefined;
+    const modePack =
+      typeof autoConfigSource.modePack === "string" ? autoConfigSource.modePack : undefined;
+
+    const candidates = await buildAutoCandidates(eligibleModels, combo.name);
+    if (candidates.length > 0) {
+      let selectedProvider = null;
+      let selectedModel = null;
+      let selectionReason = "";
+
+      if (routingStrategy !== "rules") {
+        try {
+          const decision = selectWithStrategy(
+            candidates,
+            { taskType, requestHasTools },
+            routingStrategy
+          );
+          selectedProvider = decision.provider;
+          selectedModel = decision.model;
+          selectionReason = decision.reason;
+        } catch (err) {
+          log.warn(
+            "COMBO",
+            `Auto strategy '${routingStrategy}' failed (${err?.message || "unknown"}), falling back to rules`
+          );
+        }
+      }
+
+      if (!selectedProvider || !selectedModel) {
+        const selection = selectAutoProvider(
+          {
+            id: combo.id || combo.name,
+            name: combo.name,
+            type: "auto",
+            candidatePool,
+            weights,
+            modePack,
+            budgetCap,
+            explorationRate,
+          },
+          candidates,
+          taskType
+        );
+        selectedProvider = selection.provider;
+        selectedModel = selection.model;
+        selectionReason = `score=${selection.score.toFixed(3)}${selection.isExploration ? " (exploration)" : ""}`;
+      }
+
+      const modelLookup = new Map();
+      for (const modelStr of eligibleModels) {
+        const parsed = parseModel(modelStr);
+        const provider = parsed.provider || parsed.providerAlias || "unknown";
+        const modelId = parsed.model || modelStr;
+        modelLookup.set(`${provider}/${modelId}`, modelStr);
+      }
+
+      const ranked = scorePool(candidates, taskType, weights)
+        .map((r) => modelLookup.get(`${r.provider}/${r.model}`) || `${r.provider}/${r.model}`)
+        .filter(Boolean);
+
+      const selectedModelStr =
+        modelLookup.get(`${selectedProvider}/${selectedModel}`) ||
+        `${selectedProvider}/${selectedModel}`;
+      orderedModels = [...new Set([selectedModelStr, ...ranked, ...eligibleModels])];
+
+      log.info(
+        "COMBO",
+        `Auto selection: ${selectedModelStr} | intent=${intent} task=${taskType} | strategy=${routingStrategy} | ${selectionReason}`
+      );
+    } else {
+      log.warn("COMBO", "Auto strategy has no candidates, keeping default ordering");
+    }
+  } else if (strategy === "strict-random") {
    const selectedId = await getNextFromDeck(`combo:${combo.name}`, orderedModels);
    // Put selected model first so the fallback loop tries it first
    const rest = orderedModels.filter((m) => m !== selectedId);
@@ -348,7 +715,7 @@ export async function handleComboChat({
        `Trying model ${i + 1}/${orderedModels.length}: ${modelStr}${retry > 0 ? ` (retry ${retry})` : ""}`
      );

-      const result = await handleSingleModel(body, modelStr);
+      const result = await handleSingleModelWrapped(body, modelStr);

      // Success — return response
      if (result.ok) {
@@ -0,0 +1,188 @@
+/**
+ * comboAgentMiddleware.ts — Combo Agent Features
+ *
+ * Implements the "combo as agent" features from issues #399 and #401:
+ *
+ * 1. **System Message Override** (#399): If the combo defines a `system_message`,
+ *    it is injected as the first system message, replacing any existing system message.
+ *
+ * 2. **Tool Filter Regex** (#399): If the combo defines a `tool_filter_regex`,
+ *    only tools whose name matches the pattern are forwarded to the provider.
+ *
+ * 3. **Context Caching Protection** (#401): If the combo enables
+ *    `context_cache_protection`, the proxy:
+ *    a. On response: injects `<omniModel>provider/model</omniModel>` tag into
+ *       the first assistant message content string.
+ *    b. On request: scans the message history for the tag, and if found,
+ *       overrides the requested model with the pinned one.
+ *
+ * All features are opt-in per combo and backward compatible with existing setups.
+ */
+
+interface ComboConfig {
+  system_message?: string | null;
+  tool_filter_regex?: string | null;
+  context_cache_protection?: number | boolean;
+  [key: string]: unknown;
+}
+
+interface Message {
+  role?: string;
+  content?: unknown;
+  [key: string]: unknown;
+}
+
+// ── Context Caching Tag ─────────────────────────────────────────────────────
+
+const CACHE_TAG_PATTERN = /<omniModel>([^<]+)<\/omniModel>/;
+
+/**
+ * Inject the model tag into the last assistant message (or append a new one).
+ * Only modifies string content — does not touch array content to avoid breaking
+ * Claude/Gemini multi-part message formats.
+ */
+export function injectModelTag(messages: Message[], providerModel: string): Message[] {
+  // Remove any existing tag first to avoid duplication on context compaction
+  const cleaned = messages.map((msg) => {
+    if (msg.role === "assistant" && typeof msg.content === "string") {
+      return { ...msg, content: msg.content.replace(CACHE_TAG_PATTERN, "").trimEnd() };
+    }
+    return msg;
+  });
+
+  // Find last assistant message with string content
+  const lastAssistantIdx = cleaned.map((m) => m.role).lastIndexOf("assistant");
+  if (lastAssistantIdx === -1) return cleaned;
+
+  const msg = cleaned[lastAssistantIdx];
+  if (typeof msg.content !== "string") return cleaned;
+
+  const tagged = [...cleaned];
+  tagged[lastAssistantIdx] = {
+    ...msg,
+    content: `${msg.content}\n<omniModel>${providerModel}</omniModel>`,
+  };
+  return tagged;
+}
+
+/**
+ * Scan message history for the model tag injected by a previous response.
+ * Returns the pinned "provider/model" string, or null if not found.
+ */
+export function extractPinnedModel(messages: Message[]): string | null {
+  // Scan from newest to oldest for efficiency
+  for (let i = messages.length - 1; i >= 0; i--) {
+    const msg = messages[i];
+    if (msg.role === "assistant" && typeof msg.content === "string") {
+      const match = CACHE_TAG_PATTERN.exec(msg.content);
+      if (match) return match[1];
+    }
+  }
+  return null;
+}
+
+// ── System Message Override ──────────────────────────────────────────────────
+
+/**
+ * Replace or inject a system message at the beginning of the messages array.
+ * Existing system messages are removed if a combo override is set.
+ */
+export function applySystemMessageOverride(messages: Message[], systemMessage: string): Message[] {
+  // Remove all existing system messages
+  const filtered = messages.filter((m) => m.role !== "system");
+  // Inject combo system message at start
+  return [{ role: "system", content: systemMessage }, ...filtered];
+}
+
+// ── Tool Filter Regex ────────────────────────────────────────────────────────
+
+/**
+ * Filter the tools array, keeping only tools whose name matches the regex.
+ * Returns the original array unchanged if pattern is null/empty.
+ */
+export function applyToolFilter(
+  tools: unknown[] | undefined,
+  pattern: string | null | undefined
+): unknown[] | undefined {
+  if (!tools || !pattern) return tools;
+
+  let regex: RegExp;
+  try {
+    regex = new RegExp(pattern);
+  } catch {
+    // Invalid regex — return tools unchanged rather than crashing
+    console.warn(`[ComboAgent] Invalid tool_filter_regex: "${pattern}"`);
+    return tools;
+  }
+
+  return tools.filter((tool) => {
+    const t = tool as Record<string, unknown>;
+    // Support both OpenAI format ({ function: { name } }) and Anthropic ({ name })
+    const name = (t.function as Record<string, unknown> | undefined)?.name ?? t.name ?? "";
+    return regex.test(String(name));
+  });
+}
+
+/**
+ * Strip all <omniModel> tags from message content before forwarding to the provider.
+ * The tag is an internal OmniRoute marker; providers must never see it or their
+ * cache will treat every tagged request as a new session (#454).
+ */
+export function stripModelTags(messages: Message[]): Message[] {
+  return messages.map((msg) => {
+    if (typeof msg.content === "string" && CACHE_TAG_PATTERN.test(msg.content)) {
+      return { ...msg, content: msg.content.replace(CACHE_TAG_PATTERN, "").trimEnd() };
+    }
+    return msg;
+  });
+}
+
+// ── Main Middleware ──────────────────────────────────────────────────────────
+
+/**
+ * Apply all combo agent features to the request body.
+ * Safe to call with null/undefined comboConfig — returns body unchanged.
+ */
+export function applyComboAgentMiddleware(
+  body: Record<string, unknown>,
+  comboConfig: ComboConfig | null | undefined,
+  providerModel: string // "provider/model" string for context caching
+): { body: Record<string, unknown>; pinnedModel: string | null } {
+  if (!comboConfig) return { body, pinnedModel: null };
+
+  let messages: Message[] = Array.isArray(body.messages) ? [...body.messages] : [];
+  let pinnedModel: string | null = null;
+
+  // 1. Context caching: check for pinned model in history
+  if (comboConfig.context_cache_protection) {
+    pinnedModel = extractPinnedModel(messages);
+    if (pinnedModel) {
+      // Model is pinned — caller should override model selection
+    }
+  }
+
+  // 2. System message override
+  if (comboConfig.system_message && comboConfig.system_message.trim()) {
+    messages = applySystemMessageOverride(messages, comboConfig.system_message);
+  }
+
+  // 3. Tool filter
+  const filteredTools = applyToolFilter(
+    body.tools as unknown[] | undefined,
+    comboConfig.tool_filter_regex
+  );
+
+  // 4. Strip internal <omniModel> tags before forwarding to provider (#454)
+  //    These tags are OmniRoute-internal markers and must never reach the provider
+  //    since providers would treat each tagged request as a new cache session.
+  messages = stripModelTags(messages);
+
+  return {
+    body: {
+      ...body,
+      messages,
+      ...(filteredTools !== body.tools && { tools: filteredTools }),
+    },
+    pinnedModel,
+  };
+}
@@ -21,6 +21,7 @@ interface ComboMetricsEntry {
  totalLatencyMs: number;
  strategy: string;
  lastUsedAt: string | null;
+  intentCounts: Record<string, number>;
  byModel: Record<string, ModelMetrics>;
 }

@@ -69,6 +70,7 @@ export function recordComboRequest(
      totalLatencyMs: 0,
      strategy,
      lastUsedAt: null,
+      intentCounts: {},
      byModel: {},
    });
  }
@@ -131,6 +133,7 @@ export function getComboMetrics(comboName: string): ComboMetricsView | null {
      combo.totalRequests > 0 ? Math.round((combo.totalSuccesses / combo.totalRequests) * 100) : 0,
    fallbackRate:
      combo.totalRequests > 0 ? Math.round((combo.totalFallbacks / combo.totalRequests) * 100) : 0,
+    intentCounts: { ...combo.intentCounts },
    byModel: Object.fromEntries(
      Object.entries(combo.byModel).map(([model, m]) => [
        model,
@@ -156,6 +159,30 @@ export function getAllComboMetrics(): Record<string, ComboMetricsView | null> {
  return result;
 }

+/**
+ * Record detected prompt intent for a combo (used by multilingual routing analytics).
+ */
+export function recordComboIntent(comboName: string, intent: string): void {
+  if (!metrics.has(comboName)) {
+    metrics.set(comboName, {
+      totalRequests: 0,
+      totalSuccesses: 0,
+      totalFailures: 0,
+      totalFallbacks: 0,
+      totalLatencyMs: 0,
+      strategy: "priority",
+      lastUsedAt: null,
+      intentCounts: {},
+      byModel: {},
+    });
+  }
+
+  const combo = metrics.get(comboName);
+  if (!combo) return;
+  const key = String(intent || "unknown");
+  combo.intentCounts[key] = (combo.intentCounts[key] || 0) + 1;
+}
+
 /**
 * Reset metrics for a specific combo
 */
@@ -0,0 +1,103 @@
+/**
+ * Emergency Fallback — Budget Exhaustion Redirect
+ *
+ * When a request fails due to budget exhaustion (HTTP 402 or budget keywords
+ * in the error body), optionally redirect to a free-tier model
+ * (default provider/model: nvidia + openai/gpt-oss-120b at $0.00/M tokens).
+ *
+ * Inspired by ClawRouter: "gpt-oss-120b costs nothing and serves as
+ * automatic fallback when wallet is empty."
+ */
+
+export interface EmergencyFallbackConfig {
+  enabled: boolean;
+  provider: string;
+  model: string;
+  triggerOn402: boolean;
+  triggerOnBudgetKeywords: boolean;
+  budgetKeywords: string[];
+  /** Skip fallback for tool requests (gpt-oss-120b may not support structured tool calling) */
+  skipForToolRequests: boolean;
+  maxOutputTokens: number;
+}
+
+export const EMERGENCY_FALLBACK_CONFIG: EmergencyFallbackConfig = {
+  enabled: true,
+  provider: "nvidia",
+  model: "openai/gpt-oss-120b",
+  triggerOn402: true,
+  triggerOnBudgetKeywords: true,
+  budgetKeywords: [
+    "insufficient funds",
+    "insufficient_funds",
+    "budget exceeded",
+    "budget_exceeded",
+    "quota exceeded",
+    "quota_exceeded",
+    "billing",
+    "payment required",
+    "out of credits",
+    "no credits",
+    "credit limit",
+    "spending limit",
+    "saldo insuficiente",
+    "limite de gastos",
+    "cota excedida",
+  ],
+  skipForToolRequests: true,
+  maxOutputTokens: 4096,
+};
+
+export interface FallbackDecision {
+  shouldFallback: true;
+  reason: string;
+  provider: string;
+  model: string;
+  maxOutputTokens: number;
+}
+
+export interface NoFallbackDecision {
+  shouldFallback: false;
+  reason: string;
+}
+
+export type FallbackResult = FallbackDecision | NoFallbackDecision;
+
+export function shouldUseFallback(
+  status: number,
+  errorBody: string,
+  requestHasTools: boolean,
+  config: EmergencyFallbackConfig = EMERGENCY_FALLBACK_CONFIG
+): FallbackResult {
+  if (!config.enabled) return { shouldFallback: false, reason: "emergency fallback disabled" };
+  if (config.skipForToolRequests && requestHasTools) {
+    return { shouldFallback: false, reason: "skipped: request has tools" };
+  }
+  if (config.triggerOn402 && status === 402) {
+    return {
+      shouldFallback: true,
+      reason: `HTTP 402 → emergency fallback to ${config.provider}/${config.model}`,
+      provider: config.provider,
+      model: config.model,
+      maxOutputTokens: config.maxOutputTokens,
+    };
+  }
+  if (config.triggerOnBudgetKeywords && errorBody) {
+    const lowerBody = errorBody.toLowerCase();
+    const matched = config.budgetKeywords.find((kw) => lowerBody.includes(kw.toLowerCase()));
+    if (matched) {
+      return {
+        shouldFallback: true,
+        reason: `Budget error detected ('${matched}') → emergency fallback to ${config.provider}/${config.model}`,
+        provider: config.provider,
+        model: config.model,
+        maxOutputTokens: config.maxOutputTokens,
+      };
+    }
+  }
+  return { shouldFallback: false, reason: "no budget error detected" };
+}
+
+export function isFallbackDecision(result: FallbackResult): result is FallbackDecision {
+  return result.shouldFallback === true;
+}
@@ -0,0 +1,375 @@
+/**
+ * Multilingual Intent Detection for AutoCombo
+ *
+ * Classifies prompts as: code | reasoning | simple | medium
+ * using keywords in 9 languages (EN, PT-BR, ES, ZH, JA, RU, DE, KO, AR).
+ *
+ * Inspired by ClawRouter (BlockRunAI) multilingual routing system.
+ * Execution: purely synchronous, <1ms, no I/O.
+ */
+
+export type IntentType = "code" | "reasoning" | "simple" | "medium";
+
+export const CODE_KEYWORDS: readonly string[] = [
+  // English
+  "function",
+  "class",
+  "import",
+  "def",
+  "SELECT",
+  "async",
+  "await",
+  "const",
+  "let",
+  "var",
+  "return",
+  "```",
+  "algorithm",
+  "compile",
+  "debug",
+  "refactor",
+  "typescript",
+  "python",
+  "javascript",
+  "code",
+  "implement",
+  "write a",
+  "create a component",
+  "endpoint",
+  "repository",
+  "deploy",
+  "install",
+  "script",
+  "api",
+  "database",
+  "query",
+  "schema",
+  "interface",
+  "generic",
+  "enum",
+  "module",
+  "package",
+  "dependency",
+  // Português (PT-BR)
+  "função",
+  "classe",
+  "importar",
+  "definir",
+  "consulta",
+  "assíncrono",
+  "aguardar",
+  "constante",
+  "variável",
+  "retornar",
+  "algoritmo",
+  "compilar",
+  "depurar",
+  "refatorar",
+  "código",
+  "implementar",
+  "criar um",
+  "componente",
+  "como fazer",
+  "repositório",
+  "configurar",
+  "instalar",
+  "banco de dados",
+  "escrever uma função",
+  "criar uma classe",
+  // Español
+  "función",
+  "clase",
+  "importar",
+  "definir",
+  "consulta",
+  "asíncrono",
+  "esperar",
+  "constante",
+  "variable",
+  "retornar",
+  "algoritmo",
+  "compilar",
+  "depurar",
+  "refactorizar",
+  "código",
+  "implementar",
+  // 中文
+  "函数",
+  "类",
+  "导入",
+  "定义",
+  "查询",
+  "异步",
+  "等待",
+  "常量",
+  "变量",
+  "返回",
+  "算法",
+  "编译",
+  "调试",
+  "代码",
+  // 日本語
+  "関数",
+  "クラス",
+  "インポート",
+  "非同期",
+  "定数",
+  "変数",
+  "コード",
+  "アルゴリズム",
+  // Русский
+  "функция",
+  "класс",
+  "импорт",
+  "запрос",
+  "асинхронный",
+  "константа",
+  "переменная",
+  "алгоритм",
+  "код",
+  // Deutsch
+  "funktion",
+  "klasse",
+  "importieren",
+  "abfrage",
+  "asynchron",
+  "konstante",
+  "variable",
+  "algorithmus",
+  "code",
+  // 한국어
+  "함수",
+  "클래스",
+  "가져오기",
+  "정의",
+  "쿼리",
+  "비동기",
+  "대기",
+  "상수",
+  "변수",
+  "반환",
+  "코드",
+  // العربية
+  "دالة",
+  "فئة",
+  "استيراد",
+  "استعلام",
+  "غير متزامن",
+  "ثابت",
+  "متغير",
+  "كود",
+  "خوارزمية",
+];
+
+export const REASONING_KEYWORDS: readonly string[] = [
+  // English
+  "prove",
+  "theorem",
+  "derive",
+  "step by step",
+  "chain of thought",
+  "formally",
+  "mathematical",
+  "proof",
+  "logically",
+  "analyze",
+  "reasoning",
+  "deduce",
+  "infer",
+  "hypothesis",
+  "convergence",
+  // Português (PT-BR)
+  "provar",
+  "teorema",
+  "derivar",
+  "passo a passo",
+  "cadeia de pensamento",
+  "formalmente",
+  "matemático",
+  "prova",
+  "logicamente",
+  "analisar",
+  "raciocínio",
+  "deduzir",
+  "inferir",
+  "hipótese",
+  "demonstrar",
+  "cálculo",
+  "equação diferencial",
+  "integral",
+  "otimização",
+  // Español
+  "demostrar",
+  "teorema",
+  "derivar",
+  "paso a paso",
+  "formalmente",
+  "matemático",
+  "lógicamente",
+  // 中文
+  "证明",
+  "定理",
+  "推导",
+  "逐步",
+  "思维链",
+  "数学",
+  "逻辑",
+  "分析",
+  // 日本語
+  "証明",
+  "定理",
+  "導出",
+  "論理的",
+  "分析",
+  // Русский
+  "доказать",
+  "теорема",
+  "шаг за шагом",
+  "математически",
+  "логически",
+  // Deutsch
+  "beweisen",
+  "theorem",
+  "schritt für schritt",
+  "mathematisch",
+  "logisch",
+  // 한국어
+  "증명",
+  "정리",
+  "단계별",
+  "수학적",
+  "논리적",
+  // العربية
+  "إثبات",
+  "نظرية",
+  "خطوة بخطوة",
+  "رياضي",
+  "منطقياً",
+];
+
+export const SIMPLE_KEYWORDS: readonly string[] = [
+  // English
+  "what is",
+  "define",
+  "translate",
+  "hello",
+  "yes or no",
+  "summarize",
+  "list",
+  "tell me",
+  "who is",
+  // Português (PT-BR)
+  "o que é",
+  "definir",
+  "traduzir",
+  "olá",
+  "oi",
+  "sim ou não",
+  "resumir",
+  "listar",
+  "me diga",
+  "quem é",
+  "quando foi",
+  "onde fica",
+  "explique brevemente",
+  "de forma simples",
+  // Español
+  "qué es",
+  "definir",
+  "traducir",
+  "hola",
+  "resumir",
+  "listar",
+  // 中文
+  "什么是",
+  "定义",
+  "翻译",
+  "你好",
+  "总结",
+  "列出",
+  // Русский
+  "что такое",
+  "определить",
+  "перевести",
+  "привет",
+  "резюмировать",
+  // Deutsch
+  "was ist",
+  "definieren",
+  "übersetzen",
+  "hallo",
+  "zusammenfassen",
+  // 한국어
+  "이란",
+  "정의",
+  "번역",
+  "안녕",
+  "요약",
+  // العربية
+  "ما هو",
+  "تعريف",
+  "ترجمة",
+  "مرحبا",
+  "ملخص",
+];
+
+/**
+ * Classify a prompt's intent using multilingual keyword matching.
+ * Priority: code > reasoning > simple > medium (default)
+ */
+export function classifyPromptIntent(prompt: string, systemPrompt?: string): IntentType {
+  const fullText = `${systemPrompt ?? ""} ${prompt}`.toLowerCase();
+  const wordCount = prompt.trim().split(/\s+/).length;
+
+  for (const kw of CODE_KEYWORDS) {
+    if (fullText.includes(kw.toLowerCase())) return "code";
+  }
+  for (const kw of REASONING_KEYWORDS) {
+    if (fullText.includes(kw.toLowerCase())) return "reasoning";
+  }
+  if (wordCount < 60) {
+    for (const kw of SIMPLE_KEYWORDS) {
+      if (fullText.includes(kw.toLowerCase())) return "simple";
+    }
+  }
+  return "medium";
+}
+
+export interface IntentClassifierConfig {
+  enabled: boolean;
+  extraCodeKeywords?: string[];
+  extraReasoningKeywords?: string[];
+  extraSimpleKeywords?: string[];
+  simpleMaxWords?: number;
+}
+
+export const DEFAULT_INTENT_CONFIG: IntentClassifierConfig = {
+  enabled: true,
+  simpleMaxWords: 60,
+};
+
+export function classifyWithConfig(
+  prompt: string,
+  config: IntentClassifierConfig,
+  systemPrompt?: string
+): IntentType {
+  if (!config.enabled) return "medium";
+  const fullText = `${systemPrompt ?? ""} ${prompt}`.toLowerCase();
+  const wordCount = prompt.trim().split(/\s+/).length;
+  const maxSimpleWords = config.simpleMaxWords ?? 60;
+  const codeKws = [...CODE_KEYWORDS, ...(config.extraCodeKeywords ?? [])];
+  const reasoningKws = [...REASONING_KEYWORDS, ...(config.extraReasoningKeywords ?? [])];
+  const simpleKws = [...SIMPLE_KEYWORDS, ...(config.extraSimpleKeywords ?? [])];
+  for (const kw of codeKws) {
+    if (fullText.includes(kw.toLowerCase())) return "code";
+  }
+  for (const kw of reasoningKws) {
+    if (fullText.includes(kw.toLowerCase())) return "reasoning";
+  }
+  if (wordCount < maxSimpleWords) {
+    for (const kw of simpleKws) {
+      if (fullText.includes(kw.toLowerCase())) return "simple";
+    }
+  }
+  return "medium";
+}
@@ -23,6 +23,18 @@ const PROVIDER_MODEL_ALIASES = {
    "gemini-3-flash": "gemini-3-flash-preview",
    "raptor-mini": "oswe-vscode-prime",
  },
+  gemini: {
+    "gemini-3.1-pro-preview": "gemini-3.1-pro",
+    "gemini-3-1-pro": "gemini-3.1-pro",
+  },
+  "gemini-cli": {
+    "gemini-3.1-pro-preview": "gemini-3.1-pro",
+    "gemini-3-1-pro": "gemini-3.1-pro",
+  },
+  nvidia: {
+    "gpt-oss-120b": "openai/gpt-oss-120b",
+    "nvidia/gpt-oss-120b": "openai/gpt-oss-120b",
+  },
  antigravity: {},
 };

@@ -0,0 +1,50 @@
+import { PROVIDER_ID_TO_ALIAS, PROVIDER_MODELS } from "../config/providerModels.ts";
+import { parseModel } from "./model.ts";
+
+// Conservative denylist fallback used when registry metadata is absent.
+// Keep small and explicit to avoid false negatives.
+const TOOL_CALLING_UNSUPPORTED_PATTERNS = [
+  "gpt-oss-120b",
+  "deepseek-reasoner",
+  "glm-4.7",
+  "glm4.7",
+];
+
+function getRegistryToolCallingFlag(providerIdOrAlias: string, modelId: string): boolean | null {
+  const providerAlias = PROVIDER_ID_TO_ALIAS[providerIdOrAlias] || providerIdOrAlias;
+  const models = PROVIDER_MODELS[providerAlias];
+  if (!Array.isArray(models)) return null;
+  const found = models.find((m) => m?.id === modelId);
+  if (!found) return null;
+  return typeof found.toolCalling === "boolean" ? found.toolCalling : null;
+}
+
+/**
+ * Returns whether a model should be considered safe for structured function/tool calling.
+ *
+ * Decision order:
+ * 1) Provider registry metadata (toolCalling flag) when available.
+ * 2) Conservative denylist fallback for known problematic model families.
+ * 3) Default true.
+ */
+export function supportsToolCalling(modelStr: string): boolean {
+  const parsed = parseModel(modelStr);
+  const provider = parsed.provider || parsed.providerAlias || "";
+  const model = parsed.model || modelStr;
+
+  if (provider) {
+    const fromRegistry = getRegistryToolCallingFlag(provider, model);
+    if (fromRegistry !== null) return fromRegistry;
+  }
+
+  const normalized = String(modelStr || "").toLowerCase();
+  if (!normalized) return false;
+
+  const blocked = TOOL_CALLING_UNSUPPORTED_PATTERNS.some((pattern) => {
+    if (normalized === pattern) return true;
+    if (normalized.endsWith(`/${pattern}`)) return true;
+    return normalized.includes(pattern);
+  });
+
+  return !blocked;
+}
@@ -0,0 +1,120 @@
+/**
+ * Request Deduplication Service
+ *
+ * Deduplicates **concurrent** identical requests to the same upstream.
+ * Inspired by ClawRouter's dedup.ts (BlockRunAI / github.com/BlockRunAI/ClawRouter).
+ *
+ * IMPORTANT: In-memory only — does NOT persist across restarts and does NOT
+ * work across multiple process instances (no cross-instance dedup).
+ */
+
+import { createHash } from "node:crypto";
+
+export interface DedupConfig {
+  enabled: boolean;
+  maxTemperatureForDedup: number;
+  timeoutMs: number;
+}
+
+export const DEFAULT_DEDUP_CONFIG: DedupConfig = {
+  enabled: true,
+  maxTemperatureForDedup: 0.1,
+  timeoutMs: 60_000,
+};
+
+export interface DedupResult<T> {
+  result: T;
+  wasDeduplicated: boolean;
+  hash: string;
+}
+
+const inflight = new Map<string, Promise<unknown>>();
+
+/**
+ * Compute a deterministic hash for a request body.
+ * Includes: model, messages, temperature, tools, tool_choice, max_tokens, response_format
+ * Excludes: stream, user, metadata (don't affect LLM output)
+ */
+export function computeRequestHash(requestBody: unknown): string {
+  const body = requestBody as Record<string, unknown>;
+  const canonical = {
+    model: body.model ?? null,
+    messages: body.messages ?? null,
+    temperature: typeof body.temperature === "number" ? body.temperature : 1.0,
+    tools: body.tools ?? null,
+    tool_choice: body.tool_choice ?? null,
+    max_tokens: body.max_tokens ?? null,
+    response_format: body.response_format ?? null,
+    top_p: body.top_p ?? null,
+    frequency_penalty: body.frequency_penalty ?? null,
+    presence_penalty: body.presence_penalty ?? null,
+  };
+  return createHash("sha256").update(JSON.stringify(canonical)).digest("hex").slice(0, 16);
+}
+
+/** Determine whether a request should be deduplicated */
+export function shouldDeduplicate(
+  requestBody: unknown,
+  config: DedupConfig = DEFAULT_DEDUP_CONFIG
+): boolean {
+  if (!config.enabled) return false;
+  const body = requestBody as Record<string, unknown>;
+  if (body.stream === true) return false;
+  const temperature = typeof body.temperature === "number" ? body.temperature : 1.0;
+  if (temperature > config.maxTemperatureForDedup) return false;
+  return true;
+}
+
+/**
+ * Execute a request with deduplication.
+ * Concurrent identical requests share one upstream call.
+ */
+export async function deduplicate<T>(
+  hash: string,
+  fn: () => Promise<T>,
+  config: DedupConfig = DEFAULT_DEDUP_CONFIG
+): Promise<DedupResult<T>> {
+  if (!config.enabled) {
+    return { result: await fn(), wasDeduplicated: false, hash };
+  }
+
+  const existing = inflight.get(hash);
+  if (existing) {
+    const result = (await existing) as T;
+    return { result, wasDeduplicated: true, hash };
+  }
+
+  let resolve!: (value: T) => void;
+  let reject!: (reason: unknown) => void;
+  const sharedPromise = new Promise<T>((res, rej) => {
+    resolve = res;
+    reject = rej;
+  });
+  inflight.set(hash, sharedPromise as Promise<unknown>);
+
+  const timer = setTimeout(() => {
+    if (inflight.get(hash) === sharedPromise) inflight.delete(hash);
+  }, config.timeoutMs);
+
+  try {
+    const result = await fn();
+    resolve(result);
+    return { result, wasDeduplicated: false, hash };
+  } catch (err) {
+    reject(err);
+    throw err;
+  } finally {
+    clearTimeout(timer);
+    if (inflight.get(hash) === sharedPromise) inflight.delete(hash);
+  }
+}
+
+export function getInflightCount(): number {
+  return inflight.size;
+}
+export function getInflightHashes(): string[] {
+  return [...inflight.keys()];
+}
+export function clearInflight(): void {
+  inflight.clear();
+}
@@ -0,0 +1,142 @@
+/**
+ * Search Cache — in-memory TTL cache with request coalescing
+ *
+ * Bounded at MAX_CACHE_ENTRIES to prevent OOM.
+ * Request coalescing deduplicates concurrent identical queries
+ * to prevent cache stampede (critical for agentic tools).
+ */
+
+import { createHash } from "crypto";
+
+const MAX_CACHE_ENTRIES = 5000;
+const DEFAULT_TTL_MS = parseInt(process.env.SEARCH_CACHE_TTL_MS || String(5 * 60 * 1000), 10);
+
+interface CacheEntry<T> {
+  data: T;
+  expiresAt: number;
+}
+
+const cache = new Map<string, CacheEntry<unknown>>();
+const inflight = new Map<string, Promise<unknown>>();
+
+let hits = 0;
+let misses = 0;
+
+/**
+ * Normalize a query for cache key computation.
+ * NFKC normalization, lowercase, trim, collapse whitespace.
+ */
+function normalizeQuery(query: string): string {
+  return query.normalize("NFKC").toLowerCase().trim().replace(/\s+/g, " ");
+}
+
+/**
+ * Compute a deterministic cache key from search parameters.
+ */
+export function computeCacheKey(
+  query: string,
+  provider: string,
+  searchType: string,
+  maxResults: number,
+  country?: string,
+  language?: string,
+  filters?: unknown
+): string {
+  const normalized = normalizeQuery(query);
+  const payload = JSON.stringify({
+    q: normalized,
+    p: provider,
+    t: searchType,
+    n: maxResults,
+    c: country || null,
+    l: language || null,
+    f: filters || null,
+  });
+  return createHash("sha256").update(payload).digest("hex");
+}
+
+/**
+ * Evict expired entries and enforce size bound.
+ * Called lazily on writes. O(n) worst case but amortized O(1).
+ */
+function evictIfNeeded(): void {
+  const now = Date.now();
+
+  // Remove expired entries first
+  for (const [key, entry] of cache) {
+    if (entry.expiresAt <= now) {
+      cache.delete(key);
+    }
+  }
+
+  // FIFO eviction if still over limit
+  while (cache.size >= MAX_CACHE_ENTRIES) {
+    const firstKey = cache.keys().next().value;
+    if (firstKey !== undefined) {
+      cache.delete(firstKey);
+    } else {
+      break;
+    }
+  }
+}
+
+/**
+ * Get or coalesce: return cached data, join an inflight request,
+ * or execute the fetch function and cache the result.
+ *
+ * @param key - Cache key from computeCacheKey()
+ * @param ttlMs - TTL in milliseconds (0 to bypass cache)
+ * @param fetchFn - Function to execute on cache miss
+ * @returns The cached or freshly fetched data
+ */
+export async function getOrCoalesce<T>(
+  key: string,
+  ttlMs: number,
+  fetchFn: () => Promise<T>
+): Promise<{ data: T; cached: boolean }> {
+  // 1. Check cache
+  const cached = cache.get(key) as CacheEntry<T> | undefined;
+  if (cached && cached.expiresAt > Date.now()) {
+    hits++;
+    return { data: cached.data, cached: true };
+  }
+
+  // 2. Join inflight request if one exists (request coalescing)
+  const existing = inflight.get(key) as Promise<T> | undefined;
+  if (existing) {
+    hits++;
+    const data = await existing;
+    return { data, cached: true };
+  }
+
+  // 3. Cache miss — execute fetch
+  misses++;
+  const promise = fetchFn();
+  inflight.set(key, promise);
+
+  try {
+    const data = await promise;
+
+    // Store in cache
+    if (ttlMs > 0) {
+      evictIfNeeded();
+      cache.set(key, { data, expiresAt: Date.now() + ttlMs });
+    }
+
+    return { data, cached: false };
+  } finally {
+    inflight.delete(key);
+  }
+}
+
+/**
+ * Get cache statistics for monitoring.
+ */
+export function getCacheStats(): { size: number; hits: number; misses: number } {
+  return { size: cache.size, hits, misses };
+}
+
+/**
+ * Default TTL for search cache entries.
+ */
+export const SEARCH_CACHE_DEFAULT_TTL_MS = DEFAULT_TTL_MS;
@@ -75,6 +75,30 @@ function getFieldValue(source: unknown, snakeKey: string, camelKey: string): unk
  return obj[snakeKey] ?? obj[camelKey] ?? null;
 }

+function clampPercentage(value: number): number {
+  return Math.max(0, Math.min(100, value));
+}
+
+function toDisplayLabel(value: string): string {
+  return value
+    .replace(/^copilot[_\s-]*/i, "")
+    .split(/[\s_-]+/)
+    .filter(Boolean)
+    .map((part) => {
+      if (/^pro\+$/i.test(part)) return "Pro+";
+      if (/^[a-z]{2,}$/.test(part)) return part.charAt(0).toUpperCase() + part.slice(1).toLowerCase();
+      return part;
+    })
+    .join(" ")
+    .trim();
+}
+
+function shouldDisplayGitHubQuota(quota: UsageQuota | null): quota is UsageQuota {
+  if (!quota) return false;
+  if (quota.unlimited && quota.total <= 0) return false;
+  return quota.total > 0 || quota.remainingPercentage !== undefined;
+}
+
 /**
 * Get usage data for a provider connection
 * @param {Object} connection - Provider connection with accessToken
@@ -170,48 +194,65 @@ async function getGitHubUsage(accessToken, providerSpecificData) {
    }

    const data = await response.json();
+    const dataRecord = toRecord(data);

    // Handle different response formats (paid vs free)
-    if (data.quota_snapshots) {
+    if (dataRecord.quota_snapshots) {
      // Paid plan format
-      const snapshots = data.quota_snapshots;
-      const resetAt = parseResetTime(data.quota_reset_date);
+      const snapshots = toRecord(dataRecord.quota_snapshots);
+      const resetAt = parseResetTime(getFieldValue(dataRecord, "quota_reset_date", "quotaResetDate"));
+      const premiumQuota = formatGitHubQuotaSnapshot(snapshots.premium_interactions, resetAt);
+      const chatQuota = formatGitHubQuotaSnapshot(snapshots.chat, resetAt);
+      const completionsQuota = formatGitHubQuotaSnapshot(snapshots.completions, resetAt);
+      const quotas: Record<string, UsageQuota> = {};
+
+      if (shouldDisplayGitHubQuota(premiumQuota)) {
+        quotas.premium_interactions = premiumQuota;
+      }
+      if (shouldDisplayGitHubQuota(chatQuota)) {
+        quotas.chat = chatQuota;
+      }
+      if (shouldDisplayGitHubQuota(completionsQuota)) {
+        quotas.completions = completionsQuota;
+      }

      return {
-        plan: data.copilot_plan,
-        resetDate: data.quota_reset_date,
-        quotas: {
-          chat: { ...formatGitHubQuotaSnapshot(snapshots.chat), resetAt },
-          completions: { ...formatGitHubQuotaSnapshot(snapshots.completions), resetAt },
-          premium_interactions: {
-            ...formatGitHubQuotaSnapshot(snapshots.premium_interactions),
-            resetAt,
-          },
-        },
+        plan: inferGitHubPlanName(dataRecord, premiumQuota),
+        resetDate: getFieldValue(dataRecord, "quota_reset_date", "quotaResetDate"),
+        quotas,
      };
-    } else if (data.monthly_quotas || data.limited_user_quotas) {
+    } else if (dataRecord.monthly_quotas || dataRecord.limited_user_quotas) {
      // Free/limited plan format
-      const monthlyQuotas = data.monthly_quotas || {};
-      const usedQuotas = data.limited_user_quotas || {};
-      const resetAt = parseResetTime(data.limited_user_reset_date);
+      const monthlyQuotas = toRecord(dataRecord.monthly_quotas);
+      const usedQuotas = toRecord(dataRecord.limited_user_quotas);
+      const resetDate = getFieldValue(dataRecord, "limited_user_reset_date", "limitedUserResetDate");
+      const resetAt = parseResetTime(resetDate);
+      const quotas: Record<string, UsageQuota> = {};
+
+      const addLimitedQuota = (name: string) => {
+        const total = toNumber(getFieldValue(monthlyQuotas, name, name), 0);
+        const used = Math.max(0, toNumber(getFieldValue(usedQuotas, name, name), 0));
+        if (total <= 0) return null;
+        const clampedUsed = Math.min(used, total);
+        quotas[name] = {
+          used: clampedUsed,
+          total,
+          remaining: Math.max(total - clampedUsed, 0),
+          remainingPercentage: clampPercentage(((total - clampedUsed) / total) * 100),
+          unlimited: false,
+          resetAt,
+        };
+        return quotas[name];
+      };
+
+      const premiumQuota = addLimitedQuota("premium_interactions");
+      addLimitedQuota("chat");
+      addLimitedQuota("completions");

      return {
-        plan: data.copilot_plan || data.access_type_sku,
-        resetDate: data.limited_user_reset_date,
-        quotas: {
-          chat: {
-            used: usedQuotas.chat || 0,
-            total: monthlyQuotas.chat || 0,
-            unlimited: false,
-            resetAt,
-          },
-          completions: {
-            used: usedQuotas.completions || 0,
-            total: monthlyQuotas.completions || 0,
-            unlimited: false,
-            resetAt,
-          },
-        },
+        plan: inferGitHubPlanName(dataRecord, premiumQuota),
+        resetDate,
+        quotas,
      };
    }

@@ -221,17 +262,103 @@ async function getGitHubUsage(accessToken, providerSpecificData) {
  }
 }

-function formatGitHubQuotaSnapshot(quota) {
-  if (!quota) return { used: 0, total: 0, unlimited: true };
+function formatGitHubQuotaSnapshot(quota, resetAt: string | null = null): UsageQuota | null {
+  const source = toRecord(quota);
+  if (Object.keys(source).length === 0) return null;
+
+  const unlimited = source.unlimited === true;
+  const entitlement = toNumber(source.entitlement, Number.NaN);
+  const totalValue = toNumber(source.total, Number.NaN);
+  const remainingValue = toNumber(source.remaining, Number.NaN);
+  const usedValue = toNumber(source.used, Number.NaN);
+  const percentRemainingValue = toNumber(
+    getFieldValue(source, "percent_remaining", "percentRemaining"),
+    Number.NaN
+  );
+
+  let total = Number.isFinite(totalValue)
+    ? Math.max(0, totalValue)
+    : Number.isFinite(entitlement)
+      ? Math.max(0, entitlement)
+      : 0;
+  let remaining = Number.isFinite(remainingValue) ? Math.max(0, remainingValue) : undefined;
+  let used = Number.isFinite(usedValue) ? Math.max(0, usedValue) : undefined;
+  let remainingPercentage = Number.isFinite(percentRemainingValue)
+    ? clampPercentage(percentRemainingValue)
+    : undefined;
+
+  if (used === undefined && total > 0 && remaining !== undefined) {
+    used = Math.max(total - remaining, 0);
+  }
+
+  if (remaining === undefined && total > 0 && used !== undefined) {
+    remaining = Math.max(total - used, 0);
+  }
+
+  if (remainingPercentage === undefined && total > 0 && remaining !== undefined) {
+    remainingPercentage = clampPercentage((remaining / total) * 100);
+  }
+
+  if (total <= 0 && remainingPercentage !== undefined) {
+    total = 100;
+    used = 100 - remainingPercentage;
+    remaining = remainingPercentage;
+  }

  return {
-    used: quota.entitlement - quota.remaining,
-    total: quota.entitlement,
-    remaining: quota.remaining,
-    unlimited: quota.unlimited || false,
+    used: Math.max(0, used ?? 0),
+    total,
+    remaining,
+    remainingPercentage,
+    resetAt,
+    unlimited,
  };
 }

+function inferGitHubPlanName(data: JsonRecord, premiumQuota: UsageQuota | null): string {
+  const rawPlan = getFieldValue(data, "copilot_plan", "copilotPlan");
+  const rawSku = getFieldValue(data, "access_type_sku", "accessTypeSku");
+  const planText = typeof rawPlan === "string" ? rawPlan.trim() : "";
+  const skuText = typeof rawSku === "string" ? rawSku.trim() : "";
+  const combined = `${skuText} ${planText}`.trim().toUpperCase();
+  const monthlyQuotas = toRecord(getFieldValue(data, "monthly_quotas", "monthlyQuotas"));
+  const premiumTotal =
+    premiumQuota?.total ||
+    toNumber(getFieldValue(monthlyQuotas, "premium_interactions", "premiumInteractions"), 0);
+  const chatTotal = toNumber(getFieldValue(monthlyQuotas, "chat", "chat"), 0);
+
+  if (
+    combined.includes("PRO+") ||
+    combined.includes("PRO_PLUS") ||
+    combined.includes("PROPLUS")
+  ) {
+    return "Copilot Pro+";
+  }
+  if (combined.includes("ENTERPRISE")) return "Copilot Enterprise";
+  if (combined.includes("BUSINESS")) return "Copilot Business";
+  if (combined.includes("STUDENT")) return "Copilot Student";
+  if (combined.includes("FREE")) return "Copilot Free";
+  if (combined.includes("PRO")) return "Copilot Pro";
+
+  if (premiumTotal >= 1400) return "Copilot Pro+";
+  if (premiumTotal >= 900) return "Copilot Enterprise";
+  if (premiumTotal >= 250) {
+    if (combined.includes("INDIVIDUAL")) return "Copilot Pro";
+    return "Copilot Business";
+  }
+  if (premiumTotal > 0 || chatTotal === 50) return "Copilot Free";
+
+  if (skuText) {
+    const label = toDisplayLabel(skuText);
+    return label ? `Copilot ${label}` : "GitHub Copilot";
+  }
+  if (planText) {
+    const label = toDisplayLabel(planText);
+    return label ? `Copilot ${label}` : "GitHub Copilot";
+  }
+  return "GitHub Copilot";
+}
+
 /**
 * Gemini CLI Usage (Google Cloud)
 */
@@ -91,6 +91,10 @@ export function filterToOpenAIFormat(body) {
    delete body.tools;
  }

+  // Strip Claude-specific fields that OpenAI-compatible providers reject
+  delete body.metadata;
+  delete body.anthropic_version;
+
  // Normalize tools to OpenAI format (from Claude, Gemini, etc.)
  if (body.tools && Array.isArray(body.tools) && body.tools.length > 0) {
    body.tools = body.tools
@@ -1,26 +1,69 @@
 // Tool call helper functions for translator

-// Generate unique tool call ID
+const ALPHANUM9 = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
+
+// Generate unique tool call ID (default long form)
 export function generateToolCallId() {
  return `call_${Date.now().toString(36)}_${Math.random().toString(36).slice(2, 9)}`;
 }

-// Ensure all tool_calls have id field and arguments is string (some providers require it)
-export function ensureToolCallIds(body) {
+// Generate 9-char [a-zA-Z0-9] id for providers that require it (e.g. Mistral)
+function generateToolCallId9(): string {
+  let s = "";
+  for (let i = 0; i < 9; i++) {
+    s += ALPHANUM9[Math.floor(Math.random() * ALPHANUM9.length)];
+  }
+  return s;
+}
+
+/** @param options.use9CharId - When true, normalize ids to 9-char [a-zA-Z0-9] (e.g. Mistral); when false, only fix type/arguments, leave ids as-is */
+export function ensureToolCallIds(body, options?: { use9CharId?: boolean }) {
  if (!body.messages || !Array.isArray(body.messages)) return body;

-  for (const msg of body.messages) {
-    if (msg.role === "assistant" && msg.tool_calls && Array.isArray(msg.tool_calls)) {
-      for (const tc of msg.tool_calls) {
-        if (!tc.id) {
-          tc.id = generateToolCallId();
-        }
-        if (!tc.type) {
-          tc.type = "function";
-        }
-        // Ensure arguments is JSON string, not object
-        if (tc.function?.arguments && typeof tc.function.arguments !== "string") {
-          tc.function.arguments = JSON.stringify(tc.function.arguments);
+  const use9CharId = options?.use9CharId === true;
+
+  for (let i = 0; i < body.messages.length; i++) {
+    const msg = body.messages[i];
+    if (msg.role !== "assistant" || !msg.tool_calls || !Array.isArray(msg.tool_calls)) continue;
+
+    const used9 = new Set<string>();
+    const newIdsInOrder: string[] = [];
+
+    for (const tc of msg.tool_calls) {
+      if (!tc.type) {
+        tc.type = "function";
+      }
+      if (tc.function?.arguments && typeof tc.function.arguments !== "string") {
+        tc.function.arguments = JSON.stringify(tc.function.arguments);
+      }
+      if (use9CharId) {
+        let newId: string;
+        do {
+          newId = generateToolCallId9();
+        } while (used9.has(newId));
+        used9.add(newId);
+        newIdsInOrder.push(newId);
+        tc.id = newId;
+      } else {
+        // Leave id as-is, only ensure it exists for later tool message matching
+        const id =
+          tc.id != null && String(tc.id).trim() !== "" ? String(tc.id) : generateToolCallId();
+        tc.id = id;
+        newIdsInOrder.push(id);
+      }
+    }
+
+    // Tool responses (role "tool") follow in same order as tool_calls; set tool_call_id by index.
+    // Stop when we hit another assistant so we only link tool messages that immediately follow this one.
+    if (newIdsInOrder.length > 0) {
+      let idx = 0;
+      for (let j = i + 1; j < body.messages.length; j++) {
+        const later = body.messages[j];
+        if (later.role === "assistant") break;
+        if (later.role !== "tool") continue;
+        if (idx < newIdsInOrder.length) {
+          later.tool_call_id = newIdsInOrder[idx];
+          idx++;
        }
      }
    }
@@ -66,6 +66,7 @@ function normalizeOpenAIResponsesRequest(body) {
  return normalized;
 }

+/** @param options.normalizeToolCallId - When true, use 9-char tool call ids (e.g. Mistral); when false, leave ids as-is */
 // Translate request: source -> openai -> target
 export function translateRequest(
  sourceFormat,
@@ -75,9 +76,11 @@ export function translateRequest(
  stream = true,
  credentials = null,
  provider = null,
-  reqLogger = null
+  reqLogger = null,
+  options?: { normalizeToolCallId?: boolean }
 ) {
  let result = body;
+  const use9CharId = options?.normalizeToolCallId === true;

  // Phase 2: Apply thinking budget control before normalization
  result = applyThinkingBudget(result);
@@ -85,8 +88,8 @@ export function translateRequest(
  // Normalize thinking config: remove if lastMessage is not user
  normalizeThinkingConfig(result);

-  // Always ensure tool_calls have id (some providers require it)
-  ensureToolCallIds(result);
+  // Ensure tool_calls have id; optionally normalize to 9-char for providers like Mistral
+  ensureToolCallIds(result, { use9CharId });

  // Fix missing tool responses (insert empty tool_result if needed)
  fixMissingToolResponses(result);
@@ -131,7 +134,7 @@ export function translateRequest(
  }

  // Final step: prepare request for Claude format endpoints
-  if (targetFormat === FORMATS.CLAUDE) {
+  if (targetFormat === FORMATS.CLAUDE && sourceFormat !== FORMATS.CLAUDE) {
    result = prepareClaudeRequest(result, provider);
  }

@@ -140,6 +143,10 @@ export function translateRequest(
    result = normalizeOpenAIResponsesRequest(result);
  }

+  // Ensure unique tool_call ids on final payload (translators may have introduced duplicates)
+  ensureToolCallIds(result, { use9CharId });
+  fixMissingToolResponses(result);
+
  return result;
 }

@@ -6,6 +6,7 @@
 */
 import { register } from "../registry.ts";
 import { FORMATS } from "../formats.ts";
+import { generateToolCallId } from "../helpers/toolCallHelper.ts";

 type JsonRecord = Record<string, unknown>;

@@ -120,6 +121,12 @@ export function openaiResponsesToOpenAIRequest(
    }

    if (itemType === "function_call") {
+      // Skip tool calls with empty names to avoid infinite placeholder_tool loops
+      const fnName = toString(item.name).trim();
+      if (!fnName) {
+        continue;
+      }
+
      // Start or append assistant message with tool_calls
      if (!currentAssistantMsg) {
        currentAssistantMsg = {
@@ -136,7 +143,7 @@ export function openaiResponsesToOpenAIRequest(
        id: toString(item.call_id),
        type: "function",
        function: {
-          name: toString(item.name),
+          name: fnName,
          arguments: item.arguments,
        },
      });
@@ -201,6 +208,24 @@ export function openaiResponsesToOpenAIRequest(
    });
  }

+  // Filter orphaned tool results (no matching tool_call in assistant messages)
+  const allToolCallIds = new Set<string>();
+  for (const m of messages) {
+    const rec = toRecord(m);
+    if (Array.isArray(rec.tool_calls)) {
+      for (const tc of rec.tool_calls as { id?: string }[]) {
+        if (tc.id) allToolCallIds.add(String(tc.id));
+      }
+    }
+  }
+  result.messages = messages.filter((m) => {
+    const rec = toRecord(m);
+    if (rec.role === "tool" && rec.tool_call_id) {
+      return allToolCallIds.has(String(rec.tool_call_id));
+    }
+    return true;
+  });
+
  // Cleanup Responses API specific fields
  delete result.input;
  delete result.instructions;
@@ -319,10 +344,15 @@ export function openaiToOpenAIResponsesRequest(
        for (const toolCallValue of msg.tool_calls) {
          const toolCall = toRecord(toolCallValue);
          const fn = toRecord(toolCall.function);
+          // Skip tool calls with empty names to avoid infinite placeholder_tool loops
+          const fnName = toString(fn.name).trim();
+          if (!fnName) {
+            continue;
+          }
          input.push({
            type: "function_call",
-            call_id: toString(toolCall.id),
-            name: toString(fn.name),
+            call_id: toString(toolCall.id).trim() || generateToolCallId(),
+            name: fnName,
            arguments: toString(fn.arguments, "{}"),
          });
        }
@@ -339,6 +369,22 @@ export function openaiToOpenAIResponsesRequest(
    }
  }

+  // Filter orphaned function_call_output items (no matching function_call)
+  // This happens when Claude Code compaction removes messages but leaves tool results
+  const knownCallIds = new Set(
+    input
+      .filter(
+        (item: { type?: string; call_id?: string }) => item.type === "function_call" && item.call_id
+      )
+      .map((item: { type?: string; call_id?: string }) => item.call_id)
+  );
+  result.input = input.filter((item: { type?: string; call_id?: string }) => {
+    if (item.type === "function_call_output" && item.call_id) {
+      return knownCallIds.has(item.call_id);
+    }
+    return true;
+  });
+
  // If no system message, keep empty instructions
  if (!hasSystemMessage) {
    result.instructions = "";
@@ -123,6 +123,43 @@ export function openaiToClaudeRequest(model, body, stream) {

    flushCurrentMessage();

+    // Remove assistant messages with empty content (can happen when all tool_use blocks were skipped)
+    result.messages = result.messages.filter((msg) => {
+      if (msg.role === "assistant" && Array.isArray(msg.content) && msg.content.length === 0) {
+        return false;
+      }
+      return true;
+    });
+
+    // Filter orphaned tool_result blocks whose tool_use_id has no matching tool_use
+    const allToolUseIds = new Set<string>();
+    for (const msg of result.messages) {
+      if (msg.role === "assistant" && Array.isArray(msg.content)) {
+        for (const block of msg.content) {
+          if (block.type === "tool_use" && block.id) {
+            allToolUseIds.add(String(block.id));
+          }
+        }
+      }
+    }
+    for (const msg of result.messages) {
+      if (msg.role === "user" && Array.isArray(msg.content)) {
+        msg.content = msg.content.filter((block) => {
+          if (block.type === "tool_result" && block.tool_use_id) {
+            return allToolUseIds.has(String(block.tool_use_id));
+          }
+          return true;
+        });
+      }
+    }
+    // Remove user messages that became empty after orphan filtering
+    result.messages = result.messages.filter((msg) => {
+      if (msg.role === "user" && Array.isArray(msg.content) && msg.content.length === 0) {
+        return false;
+      }
+      return true;
+    });
+
    // Add cache_control to last assistant message
    for (let i = result.messages.length - 1; i >= 0; i--) {
      const message = result.messages[i];
@@ -30,6 +30,8 @@ type StreamLogger = {
 type StreamCompletePayload = {
  status: number;
  usage: unknown;
+  /** Minimal response body for call log (streaming: usage + note; non-streaming not used) */
+  responseBody?: unknown;
 };

 type StreamOptions = {
@@ -51,6 +53,8 @@ type TranslateState = ReturnType<typeof initState> & {
  toolNameMap?: unknown;
  usage?: unknown;
  finishReason?: unknown;
+  /** Accumulated message content for call log response body */
+  accumulatedContent?: string;
 };

 function getOpenAIIntermediateChunks(value: unknown): unknown[] {
@@ -106,14 +110,21 @@ export function createSSEStream(options: StreamOptions = {}) {
  let buffer = "";
  let usage = null;

-  // State for translate mode
+  // State for translate mode (accumulatedContent for call log response body)
  const state: TranslateState | null =
    mode === STREAM_MODE.TRANSLATE
-      ? { ...(initState(sourceFormat) as TranslateState), provider, toolNameMap }
+      ? {
+          ...(initState(sourceFormat) as TranslateState),
+          provider,
+          toolNameMap,
+          accumulatedContent: "",
+        }
      : null;

  // Track content length for usage estimation (both modes)
  let totalContentLength = 0;
+  // Passthrough: accumulate content for call log response body
+  let passthroughAccumulatedContent = "";

  // Guard against duplicate [DONE] events — ensures exactly one per stream
  let doneSent = false;
@@ -184,15 +195,52 @@ export function createSSEStream(options: StreamOptions = {}) {
                  typeof parsed.type === "string" &&
                  parsed.type.startsWith("response.");

+                // Detect Claude SSE payloads. Includes "ping" and "error" to ensure
+                // they bypass the Chat Completions sanitization path which would
+                // incorrectly process or drop them.
+                const isClaudeSSE =
+                  parsed.type &&
+                  typeof parsed.type === "string" &&
+                  (parsed.type.startsWith("message") ||
+                    parsed.type.startsWith("content_block") ||
+                    parsed.type === "ping" ||
+                    parsed.type === "error");
+
                if (isResponsesSSE) {
                  // Responses SSE: only extract usage, forward payload as-is
                  const extracted = extractUsage(parsed);
                  if (extracted) {
                    usage = extracted;
                  }
-                  // Track content length from Responses format
+                  // Track content length and accumulate for call log
                  if (parsed.delta && typeof parsed.delta === "string") {
                    totalContentLength += parsed.delta.length;
+                    passthroughAccumulatedContent += parsed.delta;
+                  }
+                } else if (isClaudeSSE) {
+                  // Claude SSE: extract usage, track content, forward as-is
+                  const extracted = extractUsage(parsed);
+                  if (extracted) {
+                    // Non-destructive merge: never overwrite a positive value with 0
+                    // message_start carries input_tokens, message_delta carries output_tokens
+                    if (!usage) usage = {};
+                    if (extracted.prompt_tokens > 0) usage.prompt_tokens = extracted.prompt_tokens;
+                    if (extracted.completion_tokens > 0)
+                      usage.completion_tokens = extracted.completion_tokens;
+                    if (extracted.total_tokens > 0) usage.total_tokens = extracted.total_tokens;
+                    if (extracted.cache_read_input_tokens)
+                      usage.cache_read_input_tokens = extracted.cache_read_input_tokens;
+                    if (extracted.cache_creation_input_tokens)
+                      usage.cache_creation_input_tokens = extracted.cache_creation_input_tokens;
+                  }
+                  // Track content length and accumulate from Claude format
+                  if (parsed.delta?.text) {
+                    totalContentLength += parsed.delta.text.length;
+                    passthroughAccumulatedContent += parsed.delta.text;
+                  }
+                  if (parsed.delta?.thinking) {
+                    totalContentLength += parsed.delta.thinking.length;
+                    passthroughAccumulatedContent += parsed.delta.thinking;
                  }
                } else {
                  // Chat Completions: full sanitization pipeline
@@ -219,6 +267,10 @@ export function createSSEStream(options: StreamOptions = {}) {
                  if (content && typeof content === "string") {
                    totalContentLength += content.length;
                  }
+                  if (typeof delta?.content === "string")
+                    passthroughAccumulatedContent += delta.content;
+                  if (typeof delta?.reasoning_content === "string")
+                    passthroughAccumulatedContent += delta.reasoning_content;

                  const extracted = extractUsage(parsed);
                  if (extracted) {
@@ -274,23 +326,45 @@ export function createSSEStream(options: StreamOptions = {}) {
            continue;
          }

-          // Track content length for estimation (from various formats)
-          // Include both regular content and reasoning/thinking content
+          // Track content length and accumulate for call log (from raw provider chunk, so content is never missed)
+          // Do this before translation so we capture content regardless of translator output shape

          // Claude format
          if (parsed.delta?.text) {
-            totalContentLength += parsed.delta.text.length;
+            const t = parsed.delta.text;
+            totalContentLength += t.length;
+            if (state?.accumulatedContent !== undefined && typeof t === "string")
+              state.accumulatedContent += t;
          }
          if (parsed.delta?.thinking) {
-            totalContentLength += parsed.delta.thinking.length;
+            const t = parsed.delta.thinking;
+            totalContentLength += t.length;
+            if (state?.accumulatedContent !== undefined && typeof t === "string")
+              state.accumulatedContent += t;
          }

          // OpenAI format
          if (parsed.choices?.[0]?.delta?.content) {
-            totalContentLength += parsed.choices[0].delta.content.length;
+            const c = parsed.choices[0].delta.content;
+            if (typeof c === "string") {
+              totalContentLength += c.length;
+              if (state?.accumulatedContent !== undefined) state.accumulatedContent += c;
+            } else if (Array.isArray(c)) {
+              for (const part of c) {
+                if (part?.text && typeof part.text === "string") {
+                  totalContentLength += part.text.length;
+                  if (state?.accumulatedContent !== undefined)
+                    state.accumulatedContent += part.text;
+                }
+              }
+            }
          }
          if (parsed.choices?.[0]?.delta?.reasoning_content) {
-            totalContentLength += parsed.choices[0].delta.reasoning_content.length;
+            const r = parsed.choices[0].delta.reasoning_content;
+            if (typeof r === "string") {
+              totalContentLength += r.length;
+              if (state?.accumulatedContent !== undefined) state.accumulatedContent += r;
+            }
          }

          // Gemini format - may have multiple parts
@@ -298,10 +372,30 @@ export function createSSEStream(options: StreamOptions = {}) {
            for (const part of parsed.candidates[0].content.parts) {
              if (part.text && typeof part.text === "string") {
                totalContentLength += part.text.length;
+                if (state?.accumulatedContent !== undefined) state.accumulatedContent += part.text;
              }
            }
          }

+          // Generic fallback: delta string, top-level content/text (e.g. some SSE payloads)
+          if (state?.accumulatedContent !== undefined) {
+            if (typeof (parsed as JsonRecord).delta === "string") {
+              const d = (parsed as JsonRecord).delta as string;
+              state.accumulatedContent += d;
+              totalContentLength += d.length;
+            }
+            if (typeof (parsed as JsonRecord).content === "string") {
+              const c = (parsed as JsonRecord).content as string;
+              state.accumulatedContent += c;
+              totalContentLength += c.length;
+            }
+            if (typeof (parsed as JsonRecord).text === "string") {
+              const t = (parsed as JsonRecord).text as string;
+              state.accumulatedContent += t;
+              totalContentLength += t.length;
+            }
+          }
+
          // Extract usage
          const extracted = extractUsage(parsed);
          if (extracted) state.usage = extracted; // Keep original usage for logging
@@ -317,6 +411,9 @@ export function createSSEStream(options: StreamOptions = {}) {

          if (translated?.length > 0) {
            for (const item of translated) {
+              // Content for call log is accumulated only from parsed (above) to avoid double-counting;
+              // do not add again from item here.
+
              // Filter empty chunks
              if (!hasValuableContent(item, sourceFormat)) {
                continue; // Skip this empty chunk
@@ -372,9 +469,9 @@ export function createSSEStream(options: StreamOptions = {}) {
              controller.enqueue(encoder.encode(output));
            }

-            // Estimate usage if provider didn't return valid usage (PASSTHROUGH is always OpenAI format)
+            // Estimate usage if provider didn't return valid usage
            if (!hasValidUsage(usage) && totalContentLength > 0) {
-              usage = estimateUsage(body, totalContentLength, FORMATS.OPENAI);
+              usage = estimateUsage(body, totalContentLength, sourceFormat || FORMATS.OPENAI);
            }

            if (hasValidUsage(usage)) {
@@ -388,10 +485,30 @@ export function createSSEStream(options: StreamOptions = {}) {
                status: "200 OK",
              }).catch(() => {});
            }
-            // Notify caller for call log persistence
+            // Notify caller for call log persistence (include full response body with accumulated content)
            if (onComplete) {
              try {
-                onComplete({ status: 200, usage });
+                const u = usage as Record<string, unknown> | null;
+                const prompt = Number(u?.prompt_tokens ?? u?.input_tokens ?? 0);
+                const completion = Number(u?.completion_tokens ?? u?.output_tokens ?? 0);
+                const content = passthroughAccumulatedContent.trim() || "";
+                const responseBody = {
+                  choices: [
+                    {
+                      message: {
+                        role: "assistant",
+                        content,
+                      },
+                    },
+                  ],
+                  usage: {
+                    prompt_tokens: prompt,
+                    completion_tokens: completion,
+                    total_tokens: prompt + completion,
+                  },
+                  _streamed: true,
+                };
+                onComplete({ status: 200, usage, responseBody });
              } catch {}
            }
            return;
@@ -401,6 +518,33 @@ export function createSSEStream(options: StreamOptions = {}) {
          if (buffer.trim()) {
            const parsed = parseSSELine(buffer.trim());
            if (parsed && !parsed.done) {
+              // Extract usage from remaining buffer — if the usage-bearing event
+              // (e.g. response.completed) is the last SSE line, it ends up here
+              // in the flush handler where extractUsage was not called.
+              // Non-destructive merge: some providers send usage across multiple
+              // events (e.g. prompt_tokens in message_start, completion_tokens
+              // in message_delta). Direct assignment would lose earlier data.
+              const extracted = extractUsage(parsed);
+              if (extracted) {
+                if (!state.usage) {
+                  state.usage = extracted;
+                } else {
+                  if (extracted.prompt_tokens > 0)
+                    state.usage.prompt_tokens = extracted.prompt_tokens;
+                  if (extracted.completion_tokens > 0)
+                    state.usage.completion_tokens = extracted.completion_tokens;
+                  if (extracted.total_tokens > 0) state.usage.total_tokens = extracted.total_tokens;
+                  if (extracted.cache_read_input_tokens > 0)
+                    state.usage.cache_read_input_tokens = extracted.cache_read_input_tokens;
+                  if (extracted.cache_creation_input_tokens > 0)
+                    state.usage.cache_creation_input_tokens = extracted.cache_creation_input_tokens;
+                  if (extracted.cached_tokens > 0)
+                    state.usage.cached_tokens = extracted.cached_tokens;
+                  if (extracted.reasoning_tokens > 0)
+                    state.usage.reasoning_tokens = extracted.reasoning_tokens;
+                }
+              }
+
              const translated = translateResponse(targetFormat, sourceFormat, parsed, state);

              // Log OpenAI intermediate chunks
@@ -470,10 +614,30 @@ export function createSSEStream(options: StreamOptions = {}) {
              status: "200 OK",
            }).catch(() => {});
          }
-          // Notify caller for call log persistence
+          // Notify caller for call log persistence (include full response body with accumulated content)
          if (onComplete) {
            try {
-              onComplete({ status: 200, usage: state?.usage });
+              const u = state?.usage as Record<string, unknown> | null | undefined;
+              const prompt = Number(u?.prompt_tokens ?? u?.input_tokens ?? 0);
+              const completion = Number(u?.completion_tokens ?? u?.output_tokens ?? 0);
+              const content = (state?.accumulatedContent ?? "").trim() || "";
+              const responseBody = {
+                choices: [
+                  {
+                    message: {
+                      role: "assistant",
+                      content,
+                    },
+                  },
+                ],
+                usage: {
+                  prompt_tokens: prompt,
+                  completion_tokens: completion,
+                  total_tokens: prompt + completion,
+                },
+                _streamed: true,
+              };
+              onComplete({ status: 200, usage: state?.usage, responseBody });
            } catch {}
          }
        } catch (error) {
@@ -400,8 +400,10 @@ export function logUsage(provider, usage, model = null, connectionId = null, api
  console.log(msg);

  // Save to usage DB
+  // input = total input tokens (non-cached + cache_read + cache_creation)
+  // This ensures analytics show correct totals for heavily-cached requests
  const tokens = {
-    input: inTokens,
+    input: inTokens + (cacheRead || 0) + (cacheCreation || 0),
    output: outTokens,
    cacheRead: cacheRead || 0,
    cacheCreation: cacheCreation || 0,
@@ -1,6 +1,6 @@
 {
  "name": "omniroute",
-  "version": "2.6.6",
+  "version": "2.8.2",
  "description": "Smart AI Router with auto fallback — route to FREE & cheap models, zero downtime. Works with Cursor, Cline, Claude Desktop, Codex, and any OpenAI-compatible tool.",
  "type": "module",
  "bin": {
@@ -0,0 +1 @@
+<svg width="56" height="64" viewBox="0 0 56 64" fill="none" xmlns="http://www.w3.org/2000/svg"><path fill-rule="evenodd" clip-rule="evenodd" d="M53.292 15.321l1.5-3.676s-1.909-2.043-4.227-4.358c-2.317-2.315-7.225-.953-7.225-.953L37.751 0H18.12l-5.589 6.334s-4.908-1.362-7.225.953C2.988 9.602 1.08 11.645 1.08 11.645l1.5 3.676-1.91 5.447s5.614 21.236 6.272 23.83c1.295 5.106 2.181 7.08 5.862 9.668 3.68 2.587 10.36 7.08 11.45 7.762 1.091.68 2.455 1.84 3.682 1.84 1.227 0 2.59-1.16 3.68-1.84 1.091-.681 7.77-5.175 11.452-7.762 3.68-2.587 4.567-4.562 5.862-9.668.657-2.594 6.27-23.83 6.27-23.83l-1.908-5.447z" fill="url(#paint0_linear)"/><path fill-rule="evenodd" clip-rule="evenodd" d="M34.888 11.508c.818 0 6.885-1.157 6.885-1.157s7.189 8.68 7.189 10.536c0 1.534-.619 2.134-1.347 2.842-.152.148-.31.3-.467.468l-5.39 5.717a9.42 9.42 0 01-.176.18c-.538.54-1.33 1.336-.772 2.658l.115.269c.613 1.432 1.37 3.2.407 4.99-1.025 1.906-2.78 3.178-3.905 2.967-1.124-.21-3.766-1.589-4.737-2.218-.971-.63-4.05-3.166-4.05-4.137 0-.809 2.214-2.155 3.29-2.81.214-.13.383-.232.48-.298.111-.075.297-.19.526-.332.981-.61 2.754-1.71 2.799-2.197.055-.602.034-.778-.758-2.264-.168-.316-.365-.654-.568-1.004-.754-1.295-1.598-2.745-1.41-3.784.21-1.173 2.05-1.845 3.608-2.415.194-.07.385-.14.567-.209l1.623-.609c1.556-.582 3.284-1.229 3.57-1.36.394-.181.292-.355-.903-.468a54.655 54.655 0 01-.58-.06c-1.48-.157-4.209-.446-5.535-.077-.261.073-.553.152-.86.235-1.49.403-3.317.897-3.493 1.182-.03.05-.06.093-.089.133-.168.238-.277.394-.091 1.406.055.302.169.895.31 1.629.41 2.148 1.053 5.498 1.134 6.25.011.106.024.207.036.305.103.84.171 1.399-.805 1.622l-.255.058c-1.102.252-2.717.623-3.3.623-.584 0-2.2-.37-3.302-.623l-.254-.058c-.976-.223-.907-.782-.804-1.622.012-.098.024-.2.035-.305.081-.753.725-4.112 1.137-6.259.14-.73.253-1.32.308-1.62.185-1.012.076-1.168-.092-1.406a3.743 3.743 0 01-.09-.133c-.174-.285-2-.779-3.491-1.182-.307-.083-.6-.162-.86-.235-1.327-.37-4.055-.08-5.535.077-.226.024-.422.045-.58.06-1.196.113-1.297.287-.903.468.285.131 2.013.778 3.568 1.36.597.223 1.17.437 1.624.609.183.069.373.138.568.21 1.558.57 3.398 1.241 3.608 2.414.187 1.039-.657 2.489-1.41 3.784-.204.35-.4.688-.569 1.004-.791 1.486-.812 1.662-.757 2.264.044.488 1.816 1.587 2.798 2.197.229.142.415.257.526.332.098.066.266.168.48.298 1.076.654 3.29 2 3.29 2.81 0 .97-3.078 3.507-4.05 4.137-.97.63-3.612 2.008-4.737 2.218-1.124.21-2.88-1.061-3.904-2.966-.963-1.791-.207-3.559.406-4.99l.115-.27c.559-1.322-.233-2.118-.772-2.658a9.377 9.377 0 01-.175-.18l-5.39-5.717c-.158-.167-.316-.32-.468-.468-.728-.707-1.346-1.308-1.346-2.842 0-1.855 7.189-10.536 7.189-10.536s6.066 1.157 6.884 1.157c.653 0 1.913-.433 3.227-.885.333-.114.669-.23 1-.34 1.635-.545 2.726-.549 2.726-.549s1.09.004 2.726.549c.33.11.667.226 1 .34 1.313.452 2.574.885 3.226.885zm-1.041 30.706c1.282.66 2.192 1.128 2.536 1.343.445.278.174.803-.232 1.09-.405.285-5.853 4.499-6.381 4.965l-.215.191c-.509.459-1.159 1.044-1.62 1.044-.46 0-1.11-.586-1.62-1.044l-.213-.191c-.53-.466-5.977-4.68-6.382-4.966-.405-.286-.677-.81-.232-1.09.344-.214 1.255-.683 2.539-1.344l1.22-.629c1.92-.992 4.315-1.837 4.689-1.837.373 0 2.767.844 4.689 1.837.436.226.845.437 1.222.63z" fill="#fff"/><path fill-rule="evenodd" clip-rule="evenodd" d="M43.34 6.334L37.751 0H18.12l-5.589 6.334s-4.908-1.362-7.225.953c0 0 6.544-.59 8.793 3.064 0 0 6.066 1.157 6.884 1.157.818 0 2.59-.68 4.226-1.225 1.636-.545 2.727-.549 2.727-.549s1.09.004 2.726.549 3.408 1.225 4.226 1.225c.818 0 6.885-1.157 6.885-1.157 2.249-3.654 8.792-3.064 8.792-3.064-2.317-2.315-7.225-.953-7.225-.953z" fill="url(#paint1_linear)"/><defs><linearGradient id="paint0_linear" x1=".671" y1="64.319" x2="55.2" y2="64.319" gradientUnits="userSpaceOnUse"><stop stop-color="#F50"/><stop offset=".41" stop-color="#F50"/><stop offset=".582" stop-color="#FF2000"/><stop offset="1" stop-color="#FF2000"/></linearGradient><linearGradient id="paint1_linear" x1="6.278" y1="11.466" x2="50.565" y2="11.466" gradientUnits="userSpaceOnUse"><stop stop-color="#FF452A"/><stop offset="1" stop-color="#FF2000"/></linearGradient></defs></svg>
@@ -0,0 +1,4 @@
+<svg xmlns="http://www.w3.org/2000/svg" width="48" height="48" viewBox="0 0 48 48">
+  <rect width="48" height="48" rx="8" fill="#1E40AF"/>
+  <text x="24" y="32" text-anchor="middle" font-family="system-ui,-apple-system,sans-serif" font-size="22" font-weight="700" fill="white">exa</text>
+</svg>
@@ -0,0 +1,4 @@
+<svg xmlns="http://www.w3.org/2000/svg" width="48" height="48" viewBox="0 0 48 48">
+  <rect width="48" height="48" rx="8" fill="#1E40AF"/>
+  <text x="24" y="32" text-anchor="middle" font-family="system-ui,-apple-system,sans-serif" font-size="22" font-weight="700" fill="white">exa</text>
+</svg>
@@ -14,6 +14,7 @@
 *
 * Fixes: https://github.com/diegosouzapw/OmniRoute/issues/129
 * Fixes: https://github.com/diegosouzapw/OmniRoute/issues/321
+ * Fixes: https://github.com/diegosouzapw/OmniRoute/issues/426
 */

 import { existsSync, copyFileSync, mkdirSync } from "node:fs";
@@ -80,8 +81,54 @@ if (existsSync(rootBinary)) {
  }
 }

+// Strategy 1.5: Use node-pre-gyp to download the correct prebuilt binary
+// This works on Windows without requiring node-gyp, Python, or MSVC.
+// better-sqlite3 ships prebuilts for win32-x64, win32-arm64, darwin-x64/arm64.
+console.log("  📥  Attempting to download prebuilt binary via node-pre-gyp...");
+try {
+  const { execSync } = await import("node:child_process");
+  // better-sqlite3 bundles @mapbox/node-pre-gyp — use it directly
+  const preGypBin = join(
+    ROOT,
+    "app",
+    "node_modules",
+    ".bin",
+    process.platform === "win32" ? "node-pre-gyp.cmd" : "node-pre-gyp"
+  );
+  const preGypFallback = join(
+    ROOT,
+    "app",
+    "node_modules",
+    "@mapbox",
+    "node-pre-gyp",
+    "bin",
+    "node-pre-gyp"
+  );
+  const preGypCmd = existsSync(preGypBin) ? preGypBin : preGypFallback;
+
+  if (existsSync(preGypCmd)) {
+    execSync(`"${process.execPath}" "${preGypCmd}" install --fallback-to-build=false`, {
+      cwd: join(ROOT, "app", "node_modules", "better-sqlite3"),
+      stdio: "inherit",
+      timeout: 60_000,
+    });
+    mkdirSync(dirname(appBinary), { recursive: true });
+    try {
+      process.dlopen({ exports: {} }, appBinary);
+      console.log("  ✅ Prebuilt binary downloaded and loaded successfully!\n");
+      process.exit(0);
+    } catch (loadErr) {
+      console.warn(`  ⚠️  Downloaded binary failed to load: ${loadErr.message}`);
+    }
+  } else {
+    console.warn("  ⚠️  node-pre-gyp not found, skipping prebuilt download.");
+  }
+} catch (err) {
+  console.warn(`  ⚠️  node-pre-gyp download failed: ${err.message.split("\n")[0]}`);
+}
+
 // Strategy 2: Fall back to npm rebuild (may work if build tools are available)
-console.log("  ⚠️  Root binary not available or incompatible, attempting npm rebuild...");
+console.log("  ⚠️  Attempting npm rebuild (requires build tools)...");

 try {
  const { execSync } = await import("node:child_process");
@@ -103,14 +150,23 @@ try {
  }
 }

-// If nothing worked, warn but don't fail the install — let the package stay
-// installed so users can fix manually or use the pre-flight check in the CLI
-console.warn("  ⚠️  Could not fix better-sqlite3 native module automatically.");
+// If nothing worked, warn but don't fail the install
+console.warn("\n  ⚠️  Could not fix better-sqlite3 native module automatically.");
 console.warn("     The server may not start correctly.");
-console.warn("     Try manually:");
-console.warn(`     cd ${join(ROOT, "app")} && npm rebuild better-sqlite3`);
-if (process.platform === "darwin") {
+console.warn("     Manual fix options:");
+if (process.platform === "win32") {
+  console.warn("     Option A (easiest — no build tools needed):");
+  console.warn(`       cd "${join(ROOT, "app", "node_modules", "better-sqlite3")}"`);
+  console.warn("       npx @mapbox/node-pre-gyp install --fallback-to-build=false");
+  console.warn("     Option B (requires Build Tools for Visual Studio):");
+  console.warn(`       cd "${join(ROOT, "app")}" && npm rebuild better-sqlite3`);
+  console.warn("       Install from: https://visualstudio.microsoft.com/visual-cpp-build-tools/");
+  console.warn("       Also ensure Python is installed: https://python.org");
+} else if (process.platform === "darwin") {
+  console.warn(`     cd ${join(ROOT, "app")} && npm rebuild better-sqlite3`);
  console.warn("     If build tools are missing: xcode-select --install");
+} else {
+  console.warn(`     cd ${join(ROOT, "app")} && npm rebuild better-sqlite3`);
 }
 console.warn("");

@@ -278,6 +278,19 @@ if (existsSync(swcHelpersSrc) && !existsSync(swcHelpersDst)) {
  console.log("  ✅ @swc/helpers included in standalone build.");
 }

+// ── Step 10.6: Remove large binaries from standalone build ──
+// These directories contain platform-native binaries (.node, .asar) that
+// trigger Z_DATA_ERROR during npm pack. They are not needed in the npm package.
+const binaryDirsToRemove = ["vscode-extension", "electron"];
+for (const dir of binaryDirsToRemove) {
+  const targetDir = join(APP_DIR, dir);
+  if (existsSync(targetDir)) {
+    console.log(`  🧹 Removing app/${dir}/ (not needed in npm package)...`);
+    rmSync(targetDir, { recursive: true, force: true });
+    console.log(`  ✅ app/${dir}/ removed.`);
+  }
+}
+
 // ── Done ───────────────────────────────────────────────────
 const appPkg = join(APP_DIR, "package.json");
 if (existsSync(appPkg)) {
@@ -1181,6 +1181,12 @@ function ComboFormModal({ isOpen, combo, onClose, onSave, activeProviders }) {
  const [config, setConfig] = useState(combo?.config || {});
  const [showStrategyNudge, setShowStrategyNudge] = useState(false);
  const strategyChangeMountedRef = useRef(false);
+  // Agent features (#399 / #401 / #454)
+  const [agentSystemMessage, setAgentSystemMessage] = useState<string>(combo?.system_message || "");
+  const [agentToolFilter, setAgentToolFilter] = useState<string>(combo?.tool_filter_regex || "");
+  const [agentContextCache, setAgentContextCache] = useState<boolean>(
+    !!combo?.context_cache_protection
+  );

  // DnD state
  const hasPricingForModel = useCallback(
@@ -1532,6 +1538,14 @@ function ComboFormModal({ isOpen, combo, onClose, onSave, activeProviders }) {
      saveData.config = configToSave;
    }

+    // Agent features (#399 / #401 / #454)
+    if (agentSystemMessage.trim()) saveData.system_message = agentSystemMessage.trim();
+    else delete saveData.system_message;
+    if (agentToolFilter.trim()) saveData.tool_filter_regex = agentToolFilter.trim();
+    else delete saveData.tool_filter_regex;
+    if (agentContextCache) saveData.context_cache_protection = true;
+    else delete saveData.context_cache_protection;
+
    await onSave(saveData);
    setSaving(false);
  };
@@ -2052,6 +2066,72 @@ function ComboFormModal({ isOpen, combo, onClose, onSave, activeProviders }) {
            </div>
          )}

+          {/* Agent Features (#399 / #401 / #454) */}
+          <div className="flex flex-col gap-2 p-3 bg-black/[0.02] dark:bg-white/[0.02] rounded-lg border border-black/5 dark:border-white/5">
+            <div className="flex items-center gap-1.5 mb-1">
+              <span className="material-symbols-outlined text-[14px] text-primary">smart_toy</span>
+              <p className="text-xs font-medium">Agent Features</p>
+              <span className="text-[10px] text-text-muted">
+                — optional, for agent/tool workflows
+              </span>
+            </div>
+
+            {/* System Message Override */}
+            <div>
+              <label className="text-[11px] font-medium text-text-muted block mb-0.5">
+                System Message Override
+              </label>
+              <textarea
+                rows={2}
+                value={agentSystemMessage}
+                onChange={(e) => setAgentSystemMessage(e.target.value)}
+                placeholder="Override the system prompt for all requests routed through this combo…"
+                className="w-full text-xs py-1.5 px-2 rounded border border-black/10 dark:border-white/10 bg-transparent focus:border-primary focus:outline-none resize-none"
+              />
+              <p className="text-[10px] text-text-muted mt-0.5">
+                Replaces any system message sent by the client. Leave empty to pass through client
+                system messages.
+              </p>
+            </div>
+
+            {/* Tool Filter Regex */}
+            <div>
+              <label className="text-[11px] font-medium text-text-muted block mb-0.5">
+                Tool Filter Regex
+              </label>
+              <input
+                type="text"
+                value={agentToolFilter}
+                onChange={(e) => setAgentToolFilter(e.target.value)}
+                placeholder="e.g. ^(bash|computer)$"
+                className="w-full text-xs py-1.5 px-2 rounded border border-black/10 dark:border-white/10 bg-transparent focus:border-primary focus:outline-none font-mono"
+              />
+              <p className="text-[10px] text-text-muted mt-0.5">
+                Only tools whose name matches this regex are forwarded to the provider. Leave empty
+                to forward all tools.
+              </p>
+            </div>
+
+            {/* Context Cache Protection */}
+            <div className="flex items-center justify-between gap-2">
+              <div>
+                <label className="text-[11px] font-medium text-text-muted block">
+                  Context Cache Protection
+                </label>
+                <p className="text-[10px] text-text-muted">
+                  Pins the provider/model across turns to preserve cache sessions. Internal tags are
+                  stripped before forwarding to the provider.
+                </p>
+              </div>
+              <input
+                type="checkbox"
+                checked={agentContextCache}
+                onChange={(e) => setAgentContextCache(e.target.checked)}
+                className="accent-primary shrink-0"
+              />
+            </div>
+          </div>
+
          {/* Actions */}
          <div className="flex gap-2 pt-1">
            <Button onClick={onClose} variant="ghost" fullWidth size="sm">
@@ -33,11 +33,29 @@ export default function APIPageClient({ machineId }) {
  const [viewTab, setViewTab] = useState("api");
  const [mcpStatus, setMcpStatus] = useState<any>(null);
  const [a2aStatus, setA2aStatus] = useState<any>(null);
+  const [searchProviders, setSearchProviders] = useState<any[]>([]);

  const { copied, copy } = useCopyToClipboard();

+  const fetchSearchProviders = async () => {
+    try {
+      const res = await fetch("/api/search/providers");
+      if (res.ok) {
+        const data = await res.json();
+        setSearchProviders(data.providers || []);
+      }
+    } catch {
+      // Search endpoint may not be available
+    }
+  };
+
  useEffect(() => {
-    Promise.allSettled([loadCloudSettings(), fetchModels(), fetchProtocolStatus()]).finally(() => {
+    Promise.allSettled([
+      loadCloudSettings(),
+      fetchModels(),
+      fetchProtocolStatus(),
+      fetchSearchProviders(),
+    ]).finally(() => {
      setLoading(false);
    });
  }, []);
@@ -575,6 +593,47 @@ export default function APIPageClient({ machineId }) {
            </div>
          </div>

+          {/* Search & Discovery */}
+          {searchProviders.length > 0 && (
+            <div className="mb-6">
+              <div className="flex items-center gap-2 mb-3">
+                <span className="material-symbols-outlined text-sm text-cyan-400">
+                  travel_explore
+                </span>
+                <h3 className="text-xs font-semibold text-text-muted uppercase tracking-wider">
+                  {t("categorySearch") || "Search & Discovery"}
+                </h3>
+                <div className="flex-1 h-px bg-border/50" />
+              </div>
+              <div className="flex flex-col gap-3">
+                <EndpointSection
+                  icon="search"
+                  iconColor="text-cyan-500"
+                  iconBg="bg-cyan-500/10"
+                  title={t("webSearch") || "Web Search"}
+                  path="/v1/search"
+                  description={
+                    t("webSearchDesc") ||
+                    "Unified web search across multiple providers with automatic failover and caching"
+                  }
+                  models={searchProviders.map((p) => ({
+                    id: p.id,
+                    name: p.name,
+                    owned_by: p.id,
+                    type: "search",
+                  }))}
+                  expanded={expandedEndpoint === "search"}
+                  onToggle={() =>
+                    setExpandedEndpoint(expandedEndpoint === "search" ? null : "search")
+                  }
+                  copy={copy}
+                  copied={copied}
+                  baseUrl={currentEndpoint}
+                />
+              </div>
+            </div>
+          )}
+
          {/* Utility & Management */}
          <div>
            <div className="flex items-center gap-2 mb-3">
@@ -1,27 +1,135 @@
 "use client";

-import { useState } from "react";
+import { useState, useRef, useEffect } from "react";
 import { RequestLoggerV2, ProxyLogger, SegmentedControl } from "@/shared/components";
 import ConsoleLogViewer from "@/shared/components/ConsoleLogViewer";
 import AuditLogTab from "./AuditLogTab";
 import { useTranslations } from "next-intl";

+const TIME_RANGES = [
+  { label: "1h", hours: 1 },
+  { label: "6h", hours: 6 },
+  { label: "12h", hours: 12 },
+  { label: "24h", hours: 24 },
+];
+
+const TAB_TO_LOG_TYPE: Record<string, string> = {
+  "request-logs": "request-logs",
+  "proxy-logs": "proxy-logs",
+  "audit-logs": "call-logs",
+  console: "call-logs",
+};
+
 export default function LogsPage() {
  const [activeTab, setActiveTab] = useState("request-logs");
+  const [showExport, setShowExport] = useState(false);
+  const [exporting, setExporting] = useState(false);
+  const dropdownRef = useRef<HTMLDivElement>(null);
  const t = useTranslations("logs");

+  useEffect(() => {
+    function handleClickOutside(e: MouseEvent) {
+      if (dropdownRef.current && !dropdownRef.current.contains(e.target as Node)) {
+        setShowExport(false);
+      }
+    }
+    document.addEventListener("mousedown", handleClickOutside);
+    return () => document.removeEventListener("mousedown", handleClickOutside);
+  }, []);
+
+  async function handleExport(hours: number) {
+    setExporting(true);
+    setShowExport(false);
+    try {
+      const logType = TAB_TO_LOG_TYPE[activeTab] || "call-logs";
+      const res = await fetch(`/api/logs/export?hours=${hours}&type=${logType}`);
+      if (!res.ok) throw new Error("Export failed");
+      const blob = await res.blob();
+      const url = URL.createObjectURL(blob);
+      const a = document.createElement("a");
+      a.href = url;
+      a.download = `omniroute-${logType}-${hours}h-${new Date().toISOString().slice(0, 10)}.json`;
+      document.body.appendChild(a);
+      a.click();
+      document.body.removeChild(a);
+      URL.revokeObjectURL(url);
+    } catch (err) {
+      console.error("Export failed:", err);
+    } finally {
+      setExporting(false);
+    }
+  }
+
  return (
    <div className="flex flex-col gap-6">
-      <SegmentedControl
-        options={[
-          { value: "request-logs", label: t("requestLogs") },
-          { value: "proxy-logs", label: t("proxyLogs") },
-          { value: "audit-logs", label: t("auditLog") },
-          { value: "console", label: t("console") },
-        ]}
-        value={activeTab}
-        onChange={setActiveTab}
-      />
+      <div className="flex items-center justify-between gap-4 flex-wrap">
+        <SegmentedControl
+          options={[
+            { value: "request-logs", label: t("requestLogs") },
+            { value: "proxy-logs", label: t("proxyLogs") },
+            { value: "audit-logs", label: t("auditLog") },
+            { value: "console", label: t("console") },
+          ]}
+          value={activeTab}
+          onChange={setActiveTab}
+        />
+
+        <div className="relative" ref={dropdownRef}>
+          <button
+            id="export-logs-btn"
+            onClick={() => setShowExport(!showExport)}
+            disabled={exporting}
+            className="flex items-center gap-2 px-4 py-2 text-sm font-medium rounded-lg
+              bg-[var(--card-bg,#1e1e2e)] border border-[var(--border,#333)]
+              text-[var(--text-secondary,#aaa)] hover:text-[var(--text-primary,#fff)]
+              hover:border-[var(--accent,#7c3aed)] transition-all duration-200
+              disabled:opacity-50 disabled:cursor-not-allowed"
+          >
+            <svg
+              width="16"
+              height="16"
+              viewBox="0 0 16 16"
+              fill="none"
+              stroke="currentColor"
+              strokeWidth="1.5"
+            >
+              <path
+                d="M8 2v8m0 0l-3-3m3 3l3-3M3 12h10"
+                strokeLinecap="round"
+                strokeLinejoin="round"
+              />
+            </svg>
+            {exporting ? "Exporting..." : "Export"}
+          </button>
+
+          {showExport && (
+            <div
+              className="absolute right-0 top-full mt-1 z-50 min-w-[140px] rounded-lg
+                bg-[var(--card-bg,#1e1e2e)] border border-[var(--border,#333)]
+                shadow-xl overflow-hidden animate-in fade-in"
+            >
+              <div className="px-3 py-2 text-xs text-[var(--text-muted,#666)] border-b border-[var(--border,#333)] font-medium">
+                Time Range
+              </div>
+              {TIME_RANGES.map((range) => (
+                <button
+                  key={range.hours}
+                  id={`export-${range.hours}h-btn`}
+                  onClick={() => handleExport(range.hours)}
+                  className="w-full px-3 py-2 text-sm text-left hover:bg-[var(--hover-bg,#2a2a3e)]
+                    text-[var(--text-secondary,#aaa)] hover:text-[var(--text-primary,#fff)]
+                    transition-colors flex items-center justify-between"
+                >
+                  <span>Last {range.label}</span>
+                  <span className="text-xs text-[var(--text-muted,#666)]">
+                    {range.hours === 24 ? "default" : ""}
+                  </span>
+                </button>
+              ))}
+            </div>
+          )}
+        </div>
+      </div>

      {/* Content */}
      {activeTab === "request-logs" && <RequestLoggerV2 />}
@@ -0,0 +1,406 @@
+"use client";
+
+import { useState, useEffect, useRef } from "react";
+import dynamic from "next/dynamic";
+import { useTranslations } from "next-intl";
+import { Card, Button, Select, Badge } from "@/shared/components";
+
+const Editor = dynamic(() => import("@monaco-editor/react"), { ssr: false });
+
+interface SearchProvider {
+  id: string;
+  name: string;
+  status: "active" | "no_credentials";
+  cost_per_query: number;
+}
+
+interface SearchResult {
+  title: string;
+  url: string;
+  snippet: string;
+  score?: number;
+  date?: string;
+}
+
+interface SearchResponse {
+  id: string;
+  provider: string;
+  results: SearchResult[];
+  query: string;
+  answer: string | null;
+  cached: boolean;
+  usage: {
+    queries_used: number;
+    search_cost_usd: number;
+  };
+  metrics: {
+    response_time_ms: number;
+    upstream_latency_ms: number;
+    total_results_available: number | null;
+  };
+}
+
+function formatBytes(bytes: number): string {
+  if (bytes < 1024) return `${bytes} B`;
+  return `${(bytes / 1024).toFixed(1)} KB`;
+}
+
+export default function SearchPlayground() {
+  const t = useTranslations("search");
+  const [providers, setProviders] = useState<SearchProvider[]>([]);
+  const [selectedProvider, setSelectedProvider] = useState("");
+  const [requestBody, setRequestBody] = useState(
+    JSON.stringify(
+      {
+        query: "latest AI developments",
+        max_results: 5,
+        search_type: "web",
+      },
+      null,
+      2
+    )
+  );
+  const [response, setResponse] = useState<SearchResponse | null>(null);
+  const [rawResponse, setRawResponse] = useState("");
+  const [loading, setLoading] = useState(false);
+  const [error, setError] = useState("");
+  const [duration, setDuration] = useState(0);
+  const [statusCode, setStatusCode] = useState(0);
+  const [showJson, setShowJson] = useState(false);
+  const abortRef = useRef<AbortController | null>(null);
+
+  useEffect(() => {
+    fetch("/api/search/providers")
+      .then((res) => res.json())
+      .then((data) => {
+        const allProviders = data.providers || [];
+        setProviders(allProviders);
+        const firstActive = allProviders.find((p: SearchProvider) => p.status === "active");
+        if (firstActive) setSelectedProvider(firstActive.id);
+      })
+      .catch(() => {});
+  }, []);
+
+  const handleSend = async () => {
+    setLoading(true);
+    setError("");
+    setResponse(null);
+    setRawResponse("");
+    setStatusCode(0);
+
+    const controller = new AbortController();
+    abortRef.current = controller;
+    const timeout = setTimeout(() => controller.abort(), 15_000);
+    const start = Date.now();
+
+    try {
+      let body: any;
+      try {
+        body = JSON.parse(requestBody);
+      } catch {
+        setError("Invalid JSON in request body");
+        setLoading(false);
+        clearTimeout(timeout);
+        return;
+      }
+
+      if (selectedProvider) body.provider = selectedProvider;
+
+      const res = await fetch("/api/v1/search", {
+        method: "POST",
+        headers: { "Content-Type": "application/json" },
+        body: JSON.stringify(body),
+        signal: controller.signal,
+      });
+
+      setDuration(Date.now() - start);
+      setStatusCode(res.status);
+
+      const data = await res.json();
+      setRawResponse(JSON.stringify(data, null, 2));
+
+      if (res.ok) {
+        setResponse(data);
+      } else {
+        setError(data.error?.message || data.error || `Error ${res.status}`);
+      }
+    } catch (err: any) {
+      setDuration(Date.now() - start);
+      if (err.name === "AbortError") {
+        setError("Request timed out (15s)");
+      } else {
+        setError(err.message || "Network error");
+      }
+    } finally {
+      setLoading(false);
+      clearTimeout(timeout);
+    }
+  };
+
+  const handleCancel = () => {
+    abortRef.current?.abort();
+  };
+
+  const getScoreColor = (score: number) => {
+    if (score >= 0.9) return "text-success";
+    if (score >= 0.7) return "text-warning";
+    return "text-error";
+  };
+
+  const getScoreBg = (score: number) => {
+    if (score >= 0.9) return "bg-green-500/10";
+    if (score >= 0.7) return "bg-yellow-500/10";
+    return "bg-red-500/10";
+  };
+
+  const noProviders = providers.filter((p) => p.status === "active").length === 0;
+
+  const editorTheme =
+    typeof document !== "undefined" && document.documentElement.classList.contains("dark")
+      ? "vs-dark"
+      : "light";
+
+  return (
+    <div className="grid grid-cols-1 lg:grid-cols-2 gap-4">
+      {/* Request panel */}
+      <Card>
+        <div className="p-4 space-y-3">
+          <div className="flex items-center justify-between">
+            <div className="flex items-center gap-2">
+              <span className="material-symbols-outlined text-[18px] text-text-muted">upload</span>
+              <h3 className="text-sm font-semibold text-text-main">Request</h3>
+              <Badge variant="info" size="sm">
+                POST /v1/search
+              </Badge>
+            </div>
+            <div className="flex items-center gap-1">
+              <button
+                onClick={() => navigator.clipboard.writeText(requestBody)}
+                className="p-1.5 rounded hover:bg-black/5 dark:hover:bg-white/5 text-text-muted hover:text-text-main transition-colors"
+                title="Copy"
+              >
+                <span className="material-symbols-outlined text-[16px]">content_copy</span>
+              </button>
+              <button
+                onClick={() =>
+                  setRequestBody(
+                    JSON.stringify(
+                      {
+                        query: "latest AI developments",
+                        max_results: 5,
+                        search_type: "web",
+                      },
+                      null,
+                      2
+                    )
+                  )
+                }
+                className="p-1.5 rounded hover:bg-black/5 dark:hover:bg-white/5 text-text-muted hover:text-text-main transition-colors"
+                title="Reset to default"
+              >
+                <span className="material-symbols-outlined text-[16px]">restart_alt</span>
+              </button>
+            </div>
+          </div>
+          <div className="border border-border rounded-lg overflow-hidden">
+            <Editor
+              height="400px"
+              defaultLanguage="json"
+              value={requestBody}
+              onChange={(value: string | undefined) => setRequestBody(value || "")}
+              theme={editorTheme}
+              options={{
+                minimap: { enabled: false },
+                fontSize: 12,
+                lineNumbers: "on",
+                scrollBeyondLastLine: false,
+                wordWrap: "on",
+                automaticLayout: true,
+                formatOnPaste: true,
+              }}
+            />
+          </div>
+          <div className="flex items-center gap-3">
+            <div className="flex-1">
+              <Select
+                value={selectedProvider}
+                onChange={(e: any) => setSelectedProvider(e.target.value)}
+                options={providers.map((p) => ({
+                  value: p.id,
+                  label: `${p.name}${p.status === "no_credentials" ? " (no key)" : ""}`,
+                }))}
+                className="w-full"
+              />
+            </div>
+            {loading ? (
+              <Button icon="stop" variant="secondary" onClick={handleCancel}>
+                Cancel
+              </Button>
+            ) : (
+              <Button
+                icon="search"
+                onClick={handleSend}
+                disabled={noProviders || !requestBody.trim()}
+              >
+                {t("webSearch")}
+              </Button>
+            )}
+          </div>
+          {noProviders && <p className="text-xs text-text-muted">{t("noSearchProviders")}</p>}
+        </div>
+      </Card>
+
+      {/* Response panel */}
+      <Card>
+        <div className="p-4 space-y-3">
+          <div className="flex items-center justify-between">
+            <div className="flex items-center gap-2">
+              <span className="material-symbols-outlined text-[18px] text-text-muted">
+                download
+              </span>
+              <h3 className="text-sm font-semibold text-text-main">Response</h3>
+              {statusCode > 0 && (
+                <>
+                  <Badge variant={statusCode < 400 ? "success" : "error"} size="sm">
+                    {statusCode}
+                  </Badge>
+                  <span className="text-xs text-text-muted">{duration}ms</span>
+                </>
+              )}
+              {loading && (
+                <span className="material-symbols-outlined text-[14px] text-primary animate-spin">
+                  progress_activity
+                </span>
+              )}
+            </div>
+            {response && (
+              <div className="flex gap-1">
+                <button
+                  className={`text-xs px-3 py-1 rounded-md ${
+                    !showJson
+                      ? "bg-primary/15 text-primary font-medium"
+                      : "bg-black/5 dark:bg-white/5 text-text-muted"
+                  }`}
+                  onClick={() => setShowJson(false)}
+                >
+                  {t("formatted")}
+                </button>
+                <button
+                  className={`text-xs px-3 py-1 rounded-md ${
+                    showJson
+                      ? "bg-primary/15 text-primary font-medium"
+                      : "bg-black/5 dark:bg-white/5 text-text-muted"
+                  }`}
+                  onClick={() => setShowJson(true)}
+                >
+                  {t("rawJson")}
+                </button>
+              </div>
+            )}
+          </div>
+
+          <div className="border border-border rounded-lg overflow-hidden min-h-[400px]">
+            {loading && (
+              <div className="flex items-center justify-center h-[400px]">
+                <span className="material-symbols-outlined text-[24px] text-primary animate-spin">
+                  progress_activity
+                </span>
+              </div>
+            )}
+
+            {error && !loading && (
+              <div className="p-4">
+                <div className="text-error text-sm">{error}</div>
+              </div>
+            )}
+
+            {response && !showJson && !loading && (
+              <div className="p-4 space-y-3">
+                {/* Meta bar */}
+                <div className="flex justify-between items-center p-2 bg-bg-alt rounded-lg">
+                  <div className="flex items-center gap-3 text-xs text-text-muted">
+                    <span>
+                      {response.results.length} {t("searchResults").toLowerCase()}
+                    </span>
+                    <span className="flex items-center gap-1">
+                      <span className="w-1.5 h-1.5 rounded-full bg-primary" />
+                      {response.provider}
+                    </span>
+                    <span>${response.usage?.search_cost_usd?.toFixed(4)}</span>
+                    <span>{formatBytes(rawResponse.length)}</span>
+                  </div>
+                  <span
+                    className={`text-xs flex items-center gap-1 ${
+                      response.cached ? "text-success" : "text-warning"
+                    }`}
+                  >
+                    <span
+                      className={`w-1.5 h-1.5 rounded-full ${
+                        response.cached ? "bg-success" : "bg-warning"
+                      }`}
+                    />
+                    {response.cached ? t("cacheHit") : t("cacheMiss")}
+                  </span>
+                </div>
+
+                {/* Results */}
+                {response.results.map((r, i) => (
+                  <div
+                    key={i}
+                    className="border-l-[3px] border-l-primary p-3 bg-surface rounded-r-lg border border-border"
+                  >
+                    <div className="flex justify-between items-start">
+                      <span className="text-sm font-medium text-text-main">
+                        {i + 1}. {r.title}
+                      </span>
+                      {r.score != null && (
+                        <span
+                          className={`text-[10px] px-2 py-0.5 rounded-md ml-2 whitespace-nowrap ${getScoreBg(r.score)} ${getScoreColor(r.score)}`}
+                        >
+                          {r.score.toFixed(2)}
+                        </span>
+                      )}
+                    </div>
+                    <a
+                      href={r.url}
+                      target="_blank"
+                      rel="noopener noreferrer"
+                      className="text-accent text-[11px] block mt-0.5"
+                    >
+                      {r.url}
+                    </a>
+                    <p className="text-xs text-text-muted mt-1 leading-relaxed">{r.snippet}</p>
+                  </div>
+                ))}
+              </div>
+            )}
+
+            {response && showJson && !loading && (
+              <Editor
+                height="400px"
+                defaultLanguage="json"
+                value={rawResponse}
+                theme={editorTheme}
+                options={{
+                  readOnly: true,
+                  minimap: { enabled: false },
+                  fontSize: 12,
+                  lineNumbers: "on",
+                  scrollBeyondLastLine: false,
+                  wordWrap: "on",
+                  automaticLayout: true,
+                }}
+              />
+            )}
+
+            {!loading && !error && !response && (
+              <div className="flex items-center justify-center h-[400px] text-text-muted text-sm">
+                {t("emptyState")}
+              </div>
+            )}
+          </div>
+        </div>
+      </Card>
+    </div>
+  );
+}
--- a/Show More
+++ b/Show More
				`@@ -0,0 +1 @@`
				<svg width="56" height="64" viewBox="0 0 56 64" fill="none" xmlns="http://www.w3.org/2000/svg"><path fill-rule="evenodd" clip-rule="evenodd" d="M53.292 15.321l1.5-3.676s-1.909-2.043-4.227-4.358c-2.317-2.315-7.225-.953-7.225-.953L37.751 0H18.12l-5.589 6.334s-4.908-1.362-7.225.953C2.988 9.602 1.08 11.645 1.08 11.645l1.5 3.676-1.91 5.447s5.614 21.236 6.272 23.83c1.295 5.106 2.181 7.08 5.862 9.668 3.68 2.587 10.36 7.08 11.45 7.762 1.091.68 2.455 1.84 3.682 1.84 1.227 0 2.59-1.16 3.68-1.84 1.091-.681 7.77-5.175 11.452-7.762 3.68-2.587 4.567-4.562 5.862-9.668.657-2.594 6.27-23.83 6.27-23.83l-1.908-5.447z" fill="url(#paint0_linear)"/><path fill-rule="evenodd" clip-rule="evenodd" d="M34.888 11.508c.818 0 6.885-1.157 6.885-1.157s7.189 8.68 7.189 10.536c0 1.534-.619 2.134-1.347 2.842-.152.148-.31.3-.467.468l-5.39 5.717a9.42 9.42 0 01-.176.18c-.538.54-1.33 1.336-.772 2.658l.115.269c.613 1.432 1.37 3.2.407 4.99-1.025 1.906-2.78 3.178-3.905 2.967-1.124-.21-3.766-1.589-4.737-2.218-.971-.63-4.05-3.166-4.05-4.137 0-.809 2.214-2.155 3.29-2.81.214-.13.383-.232.48-.298.111-.075.297-.19.526-.332.981-.61 2.754-1.71 2.799-2.197.055-.602.034-.778-.758-2.264-.168-.316-.365-.654-.568-1.004-.754-1.295-1.598-2.745-1.41-3.784.21-1.173 2.05-1.845 3.608-2.415.194-.07.385-.14.567-.209l1.623-.609c1.556-.582 3.284-1.229 3.57-1.36.394-.181.292-.355-.903-.468a54.655 54.655 0 01-.58-.06c-1.48-.157-4.209-.446-5.535-.077-.261.073-.553.152-.86.235-1.49.403-3.317.897-3.493 1.182-.03.05-.06.093-.089.133-.168.238-.277.394-.091 1.406.055.302.169.895.31 1.629.41 2.148 1.053 5.498 1.134 6.25.011.106.024.207.036.305.103.84.171 1.399-.805 1.622l-.255.058c-1.102.252-2.717.623-3.3.623-.584 0-2.2-.37-3.302-.623l-.254-.058c-.976-.223-.907-.782-.804-1.622.012-.098.024-.2.035-.305.081-.753.725-4.112 1.137-6.259.14-.73.253-1.32.308-1.62.185-1.012.076-1.168-.092-1.406a3.743 3.743 0 01-.09-.133c-.174-.285-2-.779-3.491-1.182-.307-.083-.6-.162-.86-.235-1.327-.37-4.055-.08-5.535.077-.226.024-.422.045-.58.06-1.196.113-1.297.287-.903.468.285.131 2.013.778 3.568 1.36.597.223 1.17.437 1.624.609.183.069.373.138.568.21 1.558.57 3.398 1.241 3.608 2.414.187 1.039-.657 2.489-1.41 3.784-.204.35-.4.688-.569 1.004-.791 1.486-.812 1.662-.757 2.264.044.488 1.816 1.587 2.798 2.197.229.142.415.257.526.332.098.066.266.168.48.298 1.076.654 3.29 2 3.29 2.81 0 .97-3.078 3.507-4.05 4.137-.97.63-3.612 2.008-4.737 2.218-1.124.21-2.88-1.061-3.904-2.966-.963-1.791-.207-3.559.406-4.99l.115-.27c.559-1.322-.233-2.118-.772-2.658a9.377 9.377 0 01-.175-.18l-5.39-5.717c-.158-.167-.316-.32-.468-.468-.728-.707-1.346-1.308-1.346-2.842 0-1.855 7.189-10.536 7.189-10.536s6.066 1.157 6.884 1.157c.653 0 1.913-.433 3.227-.885.333-.114.669-.23 1-.34 1.635-.545 2.726-.549 2.726-.549s1.09.004 2.726.549c.33.11.667.226 1 .34 1.313.452 2.574.885 3.226.885zm-1.041 30.706c1.282.66 2.192 1.128 2.536 1.343.445.278.174.803-.232 1.09-.405.285-5.853 4.499-6.381 4.965l-.215.191c-.509.459-1.159 1.044-1.62 1.044-.46 0-1.11-.586-1.62-1.044l-.213-.191c-.53-.466-5.977-4.68-6.382-4.966-.405-.286-.677-.81-.232-1.09.344-.214 1.255-.683 2.539-1.344l1.22-.629c1.92-.992 4.315-1.837 4.689-1.837.373 0 2.767.844 4.689 1.837.436.226.845.437 1.222.63z" fill="#fff"/><path fill-rule="evenodd" clip-rule="evenodd" d="M43.34 6.334L37.751 0H18.12l-5.589 6.334s-4.908-1.362-7.225.953c0 0 6.544-.59 8.793 3.064 0 0 6.066 1.157 6.884 1.157.818 0 2.59-.68 4.226-1.225 1.636-.545 2.727-.549 2.727-.549s1.09.004 2.726.549 3.408 1.225 4.226 1.225c.818 0 6.885-1.157 6.885-1.157 2.249-3.654 8.792-3.064 8.792-3.064-2.317-2.315-7.225-.953-7.225-.953z" fill="url(#paint1_linear)"/><defs><linearGradient id="paint0_linear" x1=".671" y1="64.319" x2="55.2" y2="64.319" gradientUnits="userSpaceOnUse"><stop stop-color="#F50"/><stop offset=".41" stop-color="#F50"/><stop offset=".582" stop-color="#FF2000"/><stop offset="1" stop-color="#FF2000"/></linearGradient><linearGradient id="paint1_linear" x1="6.278" y1="11.466" x2="50.565" y2="11.466" gradientUnits="userSpaceOnUse"><stop stop-color="#FF452A"/><stop offset="1" stop-color="#FF2000"/></linearGradient></defs></svg>