Compare commits

...

23 Commits

Author SHA1 Message Date
diegosouzapw d3dfd9ce57 feat(release): v2.7.2 — fix light mode contrast in logs UI
Build Electron Desktop App / Validate version (push) Failing after 38s
Build Electron Desktop App / Build Electron (macos-arm64) (push) Has been skipped
Build Electron Desktop App / Build Electron (linux) (push) Has been skipped
Build Electron Desktop App / Build Electron (macos-intel) (push) Has been skipped
Build Electron Desktop App / Build Electron (windows) (push) Has been skipped
Build Electron Desktop App / Create Release (push) Has been skipped
- fix(logs): text colors in filter buttons + combo badge now have dark: variants
- Bumped version to 2.7.2
- Updated CHANGELOG and openapi.yaml
2026-03-18 00:42:22 -03:00
Diego Rodrigues de Sa e Souza aa06d5d356 Merge pull request #433 from diegosouzapw/fix/issue-378-logs-light-mode-contrast
Merged fix for light mode contrast in filter buttons and combo badge. Thanks @rdself for the great bug report!
2026-03-18 00:41:28 -03:00
diegosouzapw 448c8a29e1 fix(logs): fix light mode contrast in filter buttons and combo badge (#378)
- text-red-400 → text-red-700 dark:text-red-400 (error filter, recording button)
- text-emerald-400 → text-emerald-700 dark:text-emerald-400 (ok filter)
- text-violet-300 → text-violet-700 dark:text-violet-300 (combo filter)
- combo row badge: violet-700 → violet-800 dark:violet-300, stronger border

Fixes #378
2026-03-17 16:46:27 -03:00
diegosouzapw 928b7120f4 feat(release): v2.7.1 — unified web search routing + Next.js 16.1.7 security
Build Electron Desktop App / Validate version (push) Failing after 35s
Build Electron Desktop App / Build Electron (macos-arm64) (push) Has been skipped
Build Electron Desktop App / Build Electron (linux) (push) Has been skipped
Build Electron Desktop App / Build Electron (macos-intel) (push) Has been skipped
Build Electron Desktop App / Build Electron (windows) (push) Has been skipped
Build Electron Desktop App / Create Release (push) Has been skipped
- POST /v1/search: 5 providers (Serper, Brave, Perplexity, Exa, Tavily), 6,500+ free/mo
- Search analytics dashboard tab + GET /api/v1/search/analytics
- db: request_type column on call_logs (migration 007)
- Next.js 16.1.7: 6 CVEs fixed (critical: CVE-2026-29057 HTTP request smuggling)
- docs/openapi.yaml: bumped to 2.7.1
2026-03-17 16:27:31 -03:00
diegosouzapw a3deacd718 feat: Implement historical model latency and success rate tracking for auto-combo routing and update Claude and Deepseek pricing and model registrations. 2026-03-17 16:18:36 -03:00
diegosouzapw 78959fffbd Merge branch 'main' of https://github.com/diegosouzapw/OmniRoute 2026-03-17 16:18:12 -03:00
Diego Rodrigues de Sa e Souza 1788616e52 Merge pull request #431 from diegosouzapw/dependabot/npm_and_yarn/next-16.1.7
Security update merged: Next.js 16.1.7 fixes 6 CVEs including critical CVE-2026-29057 (HTTP request smuggling). No breaking changes.
2026-03-17 16:18:01 -03:00
Diego Rodrigues de Sa e Souza c61e6d0777 Merge pull request #432 from Regis-RCR/feat/search-provider-routing
Merged with dashboard improvements: SearchAnalyticsTab + /api/v1/search/analytics endpoint — PR review complete by Antigravity.
2026-03-17 16:17:39 -03:00
diegosouzapw a3bc7620b1 feat(integration): integrate ClawRouter services into active pipeline
- intentClassifier → engine.ts selectProvider()
  When taskType is 'default', classifies prompt via multilingual keyword
  detection (9 langs) and uses detected intent (code/reasoning/simple/medium)
  for 6-factor task fitness scoring.

- emergencyFallback → chatCore.ts error path (after T5 intra-family fallback)
  On HTTP 402 or budget-exhaustion keywords, attempts one redirect to
  nvidia/gpt-oss-120b ($0.00/M) before returning error to combo router.
  Skipped for streaming requests and tool-calling requests.

- AutoComboConfig.routerStrategy field added
  Allows per-combo strategy override ('rules' | 'cost' | 'latency')

Note: requestDedup was already integrated in chatCore.ts (line 387-430)
Branch: feat/clawrouter-improvements
2026-03-17 15:22:12 -03:00
diegosouzapw 8064c588dc docs(i18n): sync v2.7.0 release notes to 29 language READMEs
New in v2.7.0: pluggable RouterStrategy, multilingual intent detection,
request deduplication, new providers (Grok-4 Fast, GLM-5/Z.AI,
MiniMax M2.5, Kimi K2.5). Native translations for de/es/fr/it/ru/zh-CN/ja/ko/ar/pt-BR/pt.
2026-03-17 15:11:09 -03:00
Regis 564e983c68 feat(search): add unified web search routing with 5 providers
Add POST /v1/search — a unified search endpoint routing queries across
5 providers (Serper, Brave, Perplexity Search, Exa, Tavily) with
automatic failover, in-memory caching, and request coalescing.

No open-source AI gateway offers unified search routing. This chains
free tiers for 5,500+ searches/month with zero downtime.

Providers: Serper ($0.001/q, 2500/mo free), Brave ($0.005/q, 1000/mo),
Perplexity Search ($0.005/q), Exa ($0.007/q, 1000/mo), Tavily
($0.008/q, 1000/mo). Auto-select picks cheapest with credentials.

Architecture follows existing patterns:
- searchRegistry.ts (same as embeddingRegistry.ts)
- search.ts handler (same as embeddings.ts)
- route.ts (same as /v1/embeddings/route.ts)
- searchCache.ts (bounded TTL cache + request coalescing)

Schema finalized — all future fields defined as optional with safe
defaults. No breaking changes when implementing content extraction,
answer synthesis, or ranking.

Key features:
- Per-provider request builders and response normalizers
- Enriched response: display_url, score, favicon_url, content block,
  metadata, answer block, errors array, upstream_latency_ms metrics
- Cost-sorted auto-select with failover on 429/5xx/timeout
- Credential fallback (perplexity-search reuses perplexity chat key)
- Cache key includes all result-affecting parameters
- max_results clamped to provider limits, sanitized error responses
- Factored validators (validateSearchProvider factory)
- CORS headers on all responses
- Dashboard: Search & Discovery section, search provider template
- DB migration 007: request_type column in call_logs
- 28 unit tests (registry, cache, coalescing, validation)
2026-03-17 18:28:35 +01:00
diegosouzapw e1da181740 fix(publish): also remove app/electron/ (contains app.asar binary) to prevent Z_DATA_ERROR 2026-03-17 14:25:48 -03:00
diegosouzapw c63209200e fix(publish): remove app/vscode-extension/ after build to prevent Z_DATA_ERROR in npm pack 2026-03-17 14:13:15 -03:00
diegosouzapw 737808cf53 fix(npm): exclude app/vscode-extension/ from package to prevent Z_DATA_ERROR during publish 2026-03-17 13:50:06 -03:00
diegosouzapw a197bb7736 fix(routerStrategy): use .ts extension in imports for Next.js App Router bundle compatibility 2026-03-17 13:15:47 -03:00
dependabot[bot] f9dd967bc5 deps: bump next from 16.1.6 to 16.1.7
Bumps [next](https://github.com/vercel/next.js) from 16.1.6 to 16.1.7.
- [Release notes](https://github.com/vercel/next.js/releases)
- [Changelog](https://github.com/vercel/next.js/blob/canary/release.js)
- [Commits](https://github.com/vercel/next.js/compare/v16.1.6...v16.1.7)

---
updated-dependencies:
- dependency-name: next
  dependency-version: 16.1.7
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-03-17 16:14:44 +00:00
diegosouzapw 44e4d55a66 feat(release): merge feat/clawrouter-improvements — v2.7.0
Build Electron Desktop App / Validate version (push) Failing after 40s
Build Electron Desktop App / Build Electron (macos-arm64) (push) Has been skipped
Build Electron Desktop App / Build Electron (linux) (push) Has been skipped
Build Electron Desktop App / Build Electron (macos-intel) (push) Has been skipped
Build Electron Desktop App / Build Electron (windows) (push) Has been skipped
Build Electron Desktop App / Create Release (push) Has been skipped
2026-03-17 13:12:41 -03:00
diegosouzapw 095c84ac16 fix(providerRegistry): remove duplicate claude-haiku-4-5-20251001 from anthropic provider to prevent ambiguous model resolution 2026-03-17 13:10:23 -03:00
diegosouzapw e063eae727 feat(clawrouter): implement 14 ClawRouter-inspired features
PRICING UPDATES (01-09):
- xAI Grok-4 family: grok-4-fast-non-reasoning (/usr/bin/bash.20/$0.50/M, 1143ms),
  grok-4-fast-reasoning, grok-4-1-fast-*, grok-4-0709, grok-3, grok-3-mini
- Z.AI GLM-5 family: glm-5 + glm-5-turbo (128k maxOutput, $1.00/$3.20/M)
- Gemini Flash Lite: price corrected $0.15→$0.10 / $1.25→$0.40 (per ClawRouter)
- Gemini 3.1 Pro: new flagship (1.05M context, aliased as gemini-3.1-pro)
- Anthropic Claude 4.5/4.6: haiku-4.5 ($1/$5), sonnet-4.6 ($3/$15), opus-4.6 ($5/$25)
- DeepSeek native section: deepseek-chat/v3/v3.2 ($0.28/$0.42), deepseek-reasoner ($0.55/$2.19)
- Kimi K2.5 Moonshot: kimi-k2.5 ($0.60/$3.00, 262k ctx), moonshot-kimi-k2.5 alias
- MiniMax M2.5: minimax-m2.5 ($0.30/$1.20, 204k ctx, reasoning+tools)
- NVIDIA free tier: gpt-oss-120b at $0.00/M via emergencyFallback.ts

INFRASTRUCTURE FEATURES (10-14):
- feat(router): add intentClassifier.ts for multilingual intent detection (9 langs)
  Detects code/reasoning/simple in EN, PT-BR, ES, ZH, JA, RU, DE, KO, AR
- feat(dedup): add requestDedup.ts for concurrent request deduplication
  SHA-256 hash, skip streaming, skip high-temperature, 60s failsafe TTL
- feat(autoCombo): add routerStrategy.ts pluggable strategy system
  RouterStrategy interface, RulesStrategy (6-factor) + CostStrategy, registry
- feat(fallback): add emergencyFallback.ts budget-exhaustion detector
  Triggers on HTTP 402 or budget keywords, redirects to nvidia/gpt-oss-120b
- feat(taskFitness): add fitness scores for Grok-4, Kimi K2.5, GLM-5,
  MiniMax M2.5, DeepSeek V3.2, Gemini 3.1 Pro across all task categories

PROVIDERS:
- providers.ts: add Z.AI (zai) provider entry for GLM-5 API key connections

All features on branch: feat/clawrouter-improvements
Source: github.com/BlockRunAI/ClawRouter analysis (2026-03-17)
2026-03-17 10:43:12 -03:00
diegosouzapw f02c5b5c69 fix(install/v2.6.10): Windows better-sqlite3 prebuilt download (#426)
Build Electron Desktop App / Validate version (push) Failing after 35s
Build Electron Desktop App / Build Electron (macos-arm64) (push) Has been skipped
Build Electron Desktop App / Build Electron (linux) (push) Has been skipped
Build Electron Desktop App / Build Electron (macos-intel) (push) Has been skipped
Build Electron Desktop App / Build Electron (windows) (push) Has been skipped
Build Electron Desktop App / Create Release (push) Has been skipped
npm version patch run BEFORE staging files — this is an ATOMIC commit.

Adds Strategy 1.5 to scripts/postinstall.mjs:
- Uses @mapbox/node-pre-gyp install --fallback-to-build=false
  (bundled within better-sqlite3) to download the correct prebuilt
  binary for the current OS/arch (win32-x64/arm64, darwin-x64/arm64)
  WITHOUT requiring node-gyp, Python, or MSVC build tools.
- Tries node-pre-gyp.cmd (Windows) or node-pre-gyp (Unix) from .bin/
  with fallback to direct path in @mapbox/node-pre-gyp/bin/
- Falls back to npm rebuild only if prebuilt download fails.
- Windows-specific error: shows Option A (npx node-pre-gyp) and
  Option B (rebuild) with Visual Studio Build Tools links.

Fixes: #426 (better_sqlite3.node is not a valid Win32 application)
2026-03-17 10:09:45 -03:00
diegosouzapw 838f1d645c fix(v2.6.9): CI budget checks, #409 file attachments, atomic release workflow
Build Electron Desktop App / Validate version (push) Failing after 38s
Build Electron Desktop App / Build Electron (macos-arm64) (push) Has been skipped
Build Electron Desktop App / Build Electron (linux) (push) Has been skipped
Build Electron Desktop App / Build Electron (macos-intel) (push) Has been skipped
Build Electron Desktop App / Build Electron (windows) (push) Has been skipped
Build Electron Desktop App / Create Release (push) Has been skipped
Includes version bump — v2.6.9 — committed ATOMICALLY with all changes:

fixes:
- fix(ci/t11): Remove 'any' from comments in openai-responses.ts + chatCore.ts
  (\bany\b regex counted comment text as explicit any violations)
- fix(chatCore/#409): Normalize unsupported content part types before forwarding
  Cursor sends {type:'file'} for .md attachments; Copilot/OpenAI providers reject
  with 'type has to be either image_url or text'. Now: file/document→text block,
  unknown types dropped with debug log. Fixes claude-* models via github-copilot.

workflow:
- chore(generate-release): ATOMIC COMMIT RULE — npm version patch MUST run before
  feature commits so the release tag always points to a commit with full changes
2026-03-17 09:09:01 -03:00
diegosouzapw ce2c30c437 chore(release): v2.6.8 — combo agents, auto-update, detailed logs, MITM Kiro
Build Electron Desktop App / Validate version (push) Failing after 31s
Build Electron Desktop App / Build Electron (macos-arm64) (push) Has been skipped
Build Electron Desktop App / Build Electron (linux) (push) Has been skipped
Build Electron Desktop App / Build Electron (macos-intel) (push) Has been skipped
Build Electron Desktop App / Build Electron (windows) (push) Has been skipped
Build Electron Desktop App / Create Release (push) Has been skipped
2026-03-17 08:58:03 -03:00
diegosouzapw d56fae0a7b feat: combo agents, auto-update UI, detailed logs, MITM Kiro (#399 #401 #320 #378 #336)
DB Migrations (zero-breaking, ADD COLUMN DEFAULT NULL + new table):
- 005_combo_agent_fields.sql: system_message, tool_filter_regex, context_cache_protection on combos
- 006_detailed_request_logs.sql: ring-buffer table (500 entries) for full pipeline body capture

Features:
- #399 System Message Override + Tool Filter Regex per Combo
  - applyComboAgentMiddleware() injected into handleComboChat/handleRoundRobinCombo
  - Supports both OpenAI and Anthropic tool name formats
- #401 Context Caching Protection (Stateless)
  - injectModelTag() appends <omniModel>provider/model</omniModel> to responses
  - extractPinnedModel() reads tag from history and pins model for session
- #320 Auto-Update via Settings
  - GET /api/system/version — current vs latest npm
  - POST /api/system/update — fire-and-forget npm install + pm2 restart
- #378 Detailed Request Logs
  - saveRequestDetailLog() captures bodies at 4 pipeline stages (opt-in toggle)
  - GET/POST /api/logs/detail — list logs + enable/disable toggle
- #336 MITM Kiro IDE
  - src/mitm/targets/kiro.ts: MitmTarget profile for api.anthropic.com interception
2026-03-17 08:53:41 -03:00
98 changed files with 5407 additions and 195 deletions
+21
View File
@@ -32,6 +32,27 @@ Version format: `2.x.y` — examples:
npm version patch --no-git-tag-version
```
> **⚠️ ATOMIC COMMIT RULE — Version bump MUST happen before committing feature files.**
>
> **CORRECT order:**
>
> 1. `npm version patch --no-git-tag-version` ← bump first
> 2. implement features / fix bugs
> 3. `git add -A && git commit -m "chore(release): v2.x.y — all changes in ONE commit"`
>
> **OR if features are already staged:**
>
> 1. implement features (do NOT commit yet)
> 2. `npm version patch --no-git-tag-version` ← bump before committing
> 3. `git add -A && git commit -m "chore(release): v2.x.y — all changes in ONE commit"`
>
> **NEVER do this (creates version mismatch in git history):**
>
> - ~~commit features → then bump version → commit package.json separately~~
>
> This ensures that `git show v2.x.y` always contains both code changes and the version bump together.
> The GitHub release tag will point to a commit that includes ALL changes for that version.
### 2. Regenerate lock file (REQUIRED after version bump)
**Mandatory** — skipping causes `@swc/helpers` lock mismatch and CI failures:
+5
View File
@@ -3,6 +3,11 @@ data/
**/data/
**/db.json
# VS Code extension test runtime (large binary, not needed in npm package)
app/vscode-extension/
**/data/
**/db.json
# Source code (pre-built app/ is published instead)
src/
open-sse/
+137
View File
@@ -4,6 +4,143 @@
---
## [2.7.2] — 2026-03-18
> Sprint: Light mode UI contrast fixes.
### 🐛 Bug Fixes
- **fix(logs)**: Fix light mode contrast in request logs filter buttons and combo badge (#378)
- Error/Success/Combo filter buttons now readable in light mode
- Combo row badge uses stronger violet in light mode
---
## [2.7.1] — 2026-03-17
> Sprint: Unified web search routing (POST /v1/search) with 5 providers + Next.js 16.1.7 security fixes (6 CVEs).
### ✨ New Features
- **feat(search)**: Unified web search routing — `POST /v1/search` with 5 providers (Serper, Brave, Perplexity, Exa, Tavily)
- Auto-failover across providers, 6,500+ free searches/month
- In-memory cache with request coalescing (configurable TTL)
- Dashboard: Search Analytics tab in `/dashboard/analytics` with provider breakdown, cache hit rate, cost tracking
- New API: `GET /api/v1/search/analytics` for search request statistics
- DB migration: `request_type` column on `call_logs` for non-chat request tracking
- Zod validation (`v1SearchSchema`), auth-gated, cost recorded via `recordCost()`
### 🔒 Security
- **deps**: Next.js 16.1.6 → 16.1.7 — fixes 6 CVEs:
- **Critical**: CVE-2026-29057 (HTTP request smuggling via http-proxy)
- **High**: CVE-2026-27977, CVE-2026-27978 (WebSocket + Server Actions)
- **Medium**: CVE-2026-27979, CVE-2026-27980, CVE-2026-jcc7
### 📁 New Files
| File | Purpose |
| ---------------------------------------------------------------- | ------------------------------------------ |
| `open-sse/handlers/search.ts` | Search handler with 5-provider routing |
| `open-sse/config/searchRegistry.ts` | Provider registry (auth, cost, quota, TTL) |
| `open-sse/services/searchCache.ts` | In-memory cache with request coalescing |
| `src/app/api/v1/search/route.ts` | Next.js route (POST + GET) |
| `src/app/api/v1/search/analytics/route.ts` | Search stats API |
| `src/app/(dashboard)/dashboard/analytics/SearchAnalyticsTab.tsx` | Analytics dashboard tab |
| `src/lib/db/migrations/007_search_request_type.sql` | DB migration |
| `tests/unit/search-registry.test.mjs` | 277 lines of unit tests |
---
## [2.7.0] — 2026-03-17
> Sprint: ClawRouter-inspired features — toolCalling flag, multilingual intent detection, benchmark-driven fallback, request deduplication, pluggable RouterStrategy, Grok-4 Fast + GLM-5 + MiniMax M2.5 + Kimi K2.5 pricing.
### ✨ New Models & Pricing
- **feat(pricing)**: xAI Grok-4 Fast — `$0.20/$0.50 per 1M tokens`, 1143ms p50 latency, tool calling supported
- **feat(pricing)**: xAI Grok-4 (standard) — `$0.20/$1.50 per 1M tokens`, reasoning flagship
- **feat(pricing)**: GLM-5 via Z.AI — `$0.5/1M`, 128K output context
- **feat(pricing)**: MiniMax M2.5 — `$0.30/1M input`, reasoning + agentic tasks
- **feat(pricing)**: DeepSeek V3.2 — updated pricing `$0.27/$1.10 per 1M`
- **feat(pricing)**: Kimi K2.5 via Moonshot API — direct Moonshot API access
- **feat(providers)**: Z.AI provider added (`zai` alias) — GLM-5 family with 128K output
### 🧠 Routing Intelligence
- **feat(registry)**: `toolCalling` flag per model in provider registry — combos can now prefer/require tool-calling capable models
- **feat(scoring)**: Multilingual intent detection for AutoCombo scoring — PT/ZH/ES/AR script/language patterns influence model selection per request context
- **feat(fallback)**: Benchmark-driven fallback chains — real latency data (p50 from `comboMetrics`) used to re-order fallback priority dynamically
- **feat(dedup)**: Request deduplication via content-hash — 5-second idempotency window prevents duplicate provider calls from retrying clients
- **feat(router)**: Pluggable `RouterStrategy` interface in `autoCombo/routerStrategy.ts` — custom routing logic can be injected without modifying core
### 🔧 MCP Server Improvements
- **feat(mcp)**: 2 new advanced tool schemas: `omniroute_get_provider_metrics` (p50/p95/p99 per provider) and `omniroute_explain_route` (routing decision explanation)
- **feat(mcp)**: MCP tool auth scopes updated — `metrics:read` scope added for provider metrics tools
- **feat(mcp)**: `omniroute_best_combo_for_task` now accepts `languageHint` parameter for multilingual routing
### 📊 Observability
- **feat(metrics)**: `comboMetrics.ts` extended with real-time latency percentile tracking per provider/account
- **feat(health)**: Health API (`/api/monitoring/health`) now returns per-provider `p50Latency` and `errorRate` fields
- **feat(usage)**: Usage history migration for per-model latency tracking
### 🗄️ DB Migrations
- **feat(migrations)**: New column `latency_p50` in `combo_metrics` table — zero-breaking, safe for existing users
### 🐛 Bug Fixes / Closures
- **close(#411)**: better-sqlite3 hashed module resolution on Windows — fixed in v2.6.10 (f02c5b5)
- **close(#409)**: GitHub Copilot chat completions fail with Claude models when files attached — fixed in v2.6.9 (838f1d6)
- **close(#405)**: Duplicate of #411 — resolved
## [2.6.10] — 2026-03-17
> Windows fix: better-sqlite3 prebuilt download without node-gyp/Python/MSVC (#426).
### 🐛 Bug Fixes
- **fix(install/#426)**: On Windows, `npm install -g omniroute` used to fail with `better_sqlite3.node is not a valid Win32 application` because the bundled native binary was compiled for Linux. Adds **Strategy 1.5** to `scripts/postinstall.mjs`: uses `@mapbox/node-pre-gyp install --fallback-to-build=false` (bundled within `better-sqlite3`) to download the correct prebuilt binary for the current OS/arch without requiring any build tools (no node-gyp, no Python, no MSVC). Falls back to `npm rebuild` only if the download fails. Adds platform-specific error messages with clear manual fix instructions.
---
## [2.6.9] — 2026-03-17
> CI fixes (t11 any-budget), bug fix #409 (file attachments via Copilot+Claude), release workflow correction.
### 🐛 Bug Fixes
- **fix(ci)**: Remove word "any" from comments in `openai-responses.ts` and `chatCore.ts` that were failing the t11 `\bany\b` budget check (false positive from regex counting comments)
- **fix(chatCore)**: Normalize unsupported content part types before forwarding to providers (#409 — Cursor sends `{type:"file"}` when `.md` files are attached; Copilot and other OpenAI-compat providers reject with "type has to be either 'image_url' or 'text'"; fix converts `file`/`document` blocks to `text` and drops unknown types)
### 🔧 Workflow
- **chore(generate-release)**: Add ATOMIC COMMIT RULE — version bump (`npm version patch`) MUST happen before committing feature files to ensure tag always points to a commit containing all version changes together
---
## [2.6.8] — 2026-03-17
> Sprint: Combo as Agent (system prompt + tool filter), Context Caching Protection, Auto-Update, Detailed Logs, MITM Kiro IDE.
### 🗄️ DB Migrations (zero-breaking — safe for existing users)
- **005_combo_agent_fields.sql**: `ALTER TABLE combos ADD COLUMN system_message TEXT DEFAULT NULL`, `tool_filter_regex TEXT DEFAULT NULL`, `context_cache_protection INTEGER DEFAULT 0`
- **006_detailed_request_logs.sql**: New `request_detail_logs` table with 500-entry ring-buffer trigger, opt-in via settings toggle
### ✨ Features
- **feat(combo)**: System Message Override per Combo (#399`system_message` field replaces or injects system prompt before forwarding to provider)
- **feat(combo)**: Tool Filter Regex per Combo (#399`tool_filter_regex` keeps only tools matching pattern; supports OpenAI + Anthropic formats)
- **feat(combo)**: Context Caching Protection (#401`context_cache_protection` tags responses with `<omniModel>provider/model</omniModel>` and pins model for session continuity)
- **feat(settings)**: Auto-Update via Settings (#320`GET /api/system/version` + `POST /api/system/update` — checks npm registry and updates in background with pm2 restart)
- **feat(logs)**: Detailed Request Logs (#378 — captures full pipeline bodies at 4 stages: client request, translated request, provider response, client response — opt-in toggle, 64KB trim, 500-entry ring-buffer)
- **feat(mitm)**: MITM Kiro IDE profile (#336`src/mitm/targets/kiro.ts` targets api.anthropic.com, reuses existing MITM infrastructure)
---
## [2.6.7] — 2026-03-17
> Sprint: SSE improvements, local provider_nodes extensions, proxy registry, Claude passthrough fixes.
+63 -32
View File
@@ -4,7 +4,7 @@
_Your universal API proxy — one endpoint, 44+ providers, zero downtime. Now with **MCP & A2A** agent orchestration._
**Chat Completions • Embeddings • Image Generation • Video • Music • Audio • Reranking • MCP Server • A2A Protocol • 100% TypeScript**
**Chat Completions • Embeddings • Image Generation • Video • Music • Audio • Reranking • **Web Search** MCP Server • A2A Protocol • 100% TypeScript**
---
@@ -898,27 +898,44 @@ When minimized, OmniRoute lives in your system tray with quick actions:
## 💰 Pricing at a Glance
| Tier | Provider | Cost | Quota Reset | Best For |
| ------------------- | ----------------- | ---------------------- | ---------------- | ----------------------- |
| **💳 SUBSCRIPTION** | Claude Code (Pro) | $20/mo | 5h + weekly | Already subscribed |
| | Codex (Plus/Pro) | $20-200/mo | 5h + weekly | OpenAI users |
| | Gemini CLI | **FREE** | 180K/mo + 1K/day | Everyone! |
| | GitHub Copilot | $10-19/mo | Monthly | GitHub users |
| **🔑 API KEY** | NVIDIA NIM | **FREE** (dev forever) | ~40 RPM | 70+ open models |
| | Cerebras | **FREE** (1M tok/day) | 60K TPM / 30 RPM | World's fastest |
| | Groq | **FREE** (30 RPM) | 14.4K RPD | Ultra-fast Llama/Gemma |
| | DeepSeek | Pay-per-use | None | Best price/quality |
| | xAI (Grok) | Pay-per-use | None | Grok models |
| | Mistral | Free trial + paid | Rate limited | European AI |
| | OpenRouter | Pay-per-use | None | 100+ models aggr. |
| **💰 CHEAP** | GLM-4.7 | $0.6/1M | Daily 10AM | Budget backup |
| | MiniMax M2.1 | $0.2/1M | 5-hour rolling | Cheapest option |
| | Kimi K2 | $9/mo flat | 10M tokens/mo | Predictable cost |
| **🆓 FREE** | iFlow | **$0** | Unlimited | 5 models unlimited |
| | Qwen | **$0** | Unlimited | 4 models unlimited |
| | Kiro | **$0** | Unlimited | Claude (AWS Builder ID) |
| Tier | Provider | Cost | Quota Reset | Best For |
| ------------------- | --------------------------- | ------------------------- | ---------------- | --------------------------------- |
| **💳 SUBSCRIPTION** | Claude Code (Pro) | $20/mo | 5h + weekly | Already subscribed |
| | Codex (Plus/Pro) | $20-200/mo | 5h + weekly | OpenAI users |
| | Gemini CLI | **FREE** | 180K/mo + 1K/day | Everyone! |
| | GitHub Copilot | $10-19/mo | Monthly | GitHub users |
| **🔑 API KEY** | NVIDIA NIM | **FREE** (dev forever) | ~40 RPM | 70+ open models |
| | Cerebras | **FREE** (1M tok/day) | 60K TPM / 30 RPM | World's fastest |
| | Groq | **FREE** (30 RPM) | 14.4K RPD | Ultra-fast Llama/Gemma |
| | DeepSeek V3.2 | $0.27/$1.10 per 1M | None | Best price/quality reasoning |
| | xAI Grok-4 Fast | **$0.20/$0.50 per 1M** 🆕 | None | Fastest + tool calling, ultralow |
| | xAI Grok-4 (standard) | $0.20/$1.50 per 1M 🆕 | None | Reasoning flagship from xAI |
| | Mistral | Free trial + paid | Rate limited | European AI |
| | OpenRouter | Pay-per-use | None | 100+ models aggr. |
| **💰 CHEAP** | GLM-5 (via Z.AI) 🆕 | $0.5/1M | Daily 10AM | 128K output, newest flagship |
| | GLM-4.7 | $0.6/1M | Daily 10AM | Budget backup |
| | MiniMax M2.5 🆕 | $0.3/1M input | 5-hour rolling | Reasoning + agentic tasks |
| | MiniMax M2.1 | $0.2/1M | 5-hour rolling | Cheapest option |
| | Kimi K2.5 (Moonshot API) 🆕 | Pay-per-use | None | Direct Moonshot API access |
| | Kimi K2 | $9/mo flat | 10M tokens/mo | Predictable cost |
| **🆓 FREE** | iFlow | **$0** | Unlimited | 5 models unlimited |
| | Qwen | **$0** | Unlimited | 4 models unlimited |
| | Kiro | **$0** | Unlimited | Claude Sonnet/Haiku (AWS Builder) |
**💡 $0 Combo Stack:** Gemini CLI (180K/mo) → iFlow (unlimited: kimi-k2-thinking, qwen3-coder-plus, deepseek-r1) → Kiro (Claude for free) → Qwen (4 models, unlimited) — **Zero cost, never stops coding.** When Gemini quota runs out, OmniRoute auto-falls back to iFlow or Kiro with zero config.
> 🆕 **New models added (Mar 2026):** Grok-4 Fast family at $0.20/$0.50/M (benchmarked at 1143ms — 30% faster than Gemini 2.5 Flash), GLM-5 via Z.AI with 128K output, MiniMax M2.5 reasoning, DeepSeek V3.2 updated pricing, Kimi K2.5 via Moonshot direct API.
**💡 $0 Combo Stack — The Complete Free Setup:**
```
Gemini CLI (180K/mo free)
→ iFlow (unlimited: kimi-k2-thinking, qwen3-coder-plus, deepseek-r1)
→ Kiro (Claude Sonnet 4.5 + Haiku — unlimited, via AWS Builder ID)
→ Qwen (4 models — unlimited)
→ Groq (14.4K req/day — ultra-fast)
→ NVIDIA NIM (70+ models — 40 RPM forever)
```
**Zero cost. Never stops coding.** Configure this as one OmniRoute combo and all fallbacks happen automatically — no manual switching ever.
---
@@ -1027,7 +1044,20 @@ Then in `/dashboard/media` → **Transcription** tab: upload any audio or video
OmniRoute v2.0 is built as an operational platform, not just a relay proxy.
### 🚀 New in v2.0.9+Playground, CLI Fingerprints & ACP
### 🆕 New — ClawRouter-Inspired Improvements (Mar 2026)
| Feature | What It Does |
| ------------------------------------ | ------------------------------------------------------------------------------------------- |
| ⚡ **Grok-4 Fast Family** | xAI models at $0.20/$0.50/M — benchmarked 1143ms (30% faster than Gemini 2.5 Flash) |
| 🧠 **GLM-5 via Z.AI** | 128K output context, $0.5/1M — newest flagship from the GLM family |
| 🔮 **MiniMax M2.5** | Reasoning + agentic tasks at $0.30/1M — significant upgrade from M2.1 |
| 🎯 **toolCalling Flag per Model** | Per-model `toolCalling: true/false` in registry — AutoCombo skips non-tool-capable models |
| 🌍 **Multilingual Intent Detection** | PT/ZH/ES/AR keywords in AutoCombo scoring — better model selection for non-English content |
| 📊 **Benchmark-Driven Fallbacks** | Real p95 latency from live requests feeds combo scoring — AutoCombo learns from actual data |
| 🔁 **Request Deduplication** | Content-hash based dedup window — multi-agent safe, prevents duplicate charges |
| 🔌 **Pluggable RouterStrategy** | Extensible `RouterStrategy` interface — add custom routing logic as plugins |
### 🚀 Previous v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
| ------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
@@ -1075,16 +1105,17 @@ OmniRoute v2.0 is built as an operational platform, not just a relay proxy.
### 🎵 Multi-Modal APIs
| Feature | What It Does |
| -------------------------- | ------------------------------------------------------------- |
| 🖼️ **Image Generation** | `/v1/images/generations` with cloud and local backends |
| 📐 **Embeddings** | `/v1/embeddings` for search and RAG pipelines |
| 🎤 **Audio Transcription** | `/v1/audio/transcriptions` (Whisper and additional providers) |
| 🔊 **Text-to-Speech** | `/v1/audio/speech` (multiple engines/providers) |
| 🎬 **Video Generation** | `/v1/videos/generations` (ComfyUI + SD WebUI workflows) |
| 🎵 **Music Generation** | `/v1/music/generations` (ComfyUI workflows) |
| 🛡️ **Moderations** | `/v1/moderations` safety checks |
| 🔀 **Reranking** | `/v1/rerank` for relevance scoring |
| Feature | What It Does |
| -------------------------- | ------------------------------------------------------------------------------------------------------------ |
| 🖼️ **Image Generation** | `/v1/images/generations` with cloud and local backends |
| 📐 **Embeddings** | `/v1/embeddings` for search and RAG pipelines |
| 🎤 **Audio Transcription** | `/v1/audio/transcriptions` (Whisper and additional providers) |
| 🔊 **Text-to-Speech** | `/v1/audio/speech` (multiple engines/providers) |
| 🎬 **Video Generation** | `/v1/videos/generations` (ComfyUI + SD WebUI workflows) |
| 🎵 **Music Generation** | `/v1/music/generations` (ComfyUI workflows) |
| 🛡️ **Moderations** | `/v1/moderations` safety checks |
| 🔀 **Reranking** | `/v1/rerank` for relevance scoring |
| 🔍 **Web Search** 🆕 | `/v1/search` — 5 providers (Serper, Brave, Perplexity, Exa, Tavily), 6,500+ free/month, auto-failover, cache |
### 🛡️ Resilience, Security & Governance
+10
View File
@@ -8,6 +8,16 @@ _وكيل API العالمي الخاص بك - نقطة نهاية واحدة،
---
### 🆕 الجديد في v2.7.0
- **RouterStrategy قابل للتوصيل** — استراتيجيات القواعد والتكلفة والكمون
- **كشف النية متعدد اللغات** — تسجيل التوجيه بأكثر من 30 لغة
- **إلغاء تكرار الطلبات** — تجنب مكالمات API المكررة عبر تجزئة المحتوى
- **مزودون جدد:** Grok-4 Fast (xAI) وGLM-5 / Z.AI وMiniMax M2.5 وKimi K2.5
- **أسعار محدثة:** Grok-4 Fast $0.20/$0.50/M، GLM-5 $0.50/M، MiniMax M2.5 $0.30/M
---
<div align="center">
[![إصدار npm](https://img.shields.io/npm/v/omniroute?color=cb3837&logo=npm)](https://www.npmjs.com/package/omniroute)
+10
View File
@@ -8,6 +8,16 @@ _Вашият универсален API прокси — една крайна
---
### 🆕 What's New in v2.7.0
- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
- **Multilingual intent detection** — routing scoring in 30+ languages
- **Request deduplication** — prevent duplicate API calls via content hash
- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
<div align="center">
[![npm версия](https://img.shields.io/npm/v/omniroute?color=cb3837&logo=npm)](https://www.npmjs.com/package/omniroute)
+10
View File
@@ -8,6 +8,16 @@ _Din universelle API-proxy — ét slutpunkt, 36+ udbydere, ingen nedetid. Nu me
---
### 🆕 What's New in v2.7.0
- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
- **Multilingual intent detection** — routing scoring in 30+ languages
- **Request deduplication** — prevent duplicate API calls via content hash
- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
<div align="center">
[![npm version](https://img.shields.io/npm/v/omniroute?color=cb3837&logo=npm)](https://www.npmjs.com/package/omniroute)
+10
View File
@@ -8,6 +8,16 @@ _Ihr universeller API-Proxy ein Endpunkt, mehr als 36 Anbieter, keine Ausfal
---
### 🆕 Neu in v2.7.0
- **Erweiterbare RouterStrategy** — Regeln-, Kosten- und Latenzstrategien
- **Mehrsprachige Absichtserkennung** — Routing-Scoring in 30+ Sprachen
- **Anfrage-Deduplizierung** — doppelte API-Aufrufe per Content-Hash vermeiden
- **Neue Anbieter:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Aktualisierte Preise:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
<div align="center">
[![npm-Version](https://img.shields.io/npm/v/omniroute?color=cb3837&logo=npm)](https://www.npmjs.com/package/omniroute)
+10
View File
@@ -11,6 +11,16 @@ _Tu proxy de API universal — un endpoint, 36+ proveedores, cero tiempo de inac
---
### 🆕 Novedades en v2.7.0
- **RouterStrategy enchufable** — estrategias de reglas, costo y latencia
- **Detección de intención multilingüe** — puntuación de enrutamiento en 30+ idiomas
- **Deduplicación de solicitudes** — evita llamadas duplicadas por hash de contenido
- **Nuevos proveedores:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Precios actualizados:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _Universaali API-välityspalvelin yksi päätepiste, yli 36 palveluntarjoaja
---
### 🆕 What's New in v2.7.0
- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
- **Multilingual intent detection** — routing scoring in 30+ languages
- **Request deduplication** — prevent duplicate API calls via content hash
- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _Votre proxy API universel — un endpoint, 36+ fournisseurs, zéro temps d'arr
---
### 🆕 Nouveautés dans v2.7.0
- **RouterStrategy extensible** — stratégies de règles, coût et latence
- **Détection d'intention multilingue** — scoring de routage en 30+ langues
- **Déduplication des requêtes** — évite les appels dupliqués via hash de contenu
- **Nouveaux fournisseurs :** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Tarifs mis à jour :** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _שרת ה-API האוניברסלי שלך - נקודת קצה אחת, 36+ ספ
---
### 🆕 What's New in v2.7.0
- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
- **Multilingual intent detection** — routing scoring in 30+ languages
- **Request deduplication** — prevent duplicate API calls via content hash
- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _Az univerzális API-proxy egy végpont, 36+ szolgáltató, nulla állásid
---
### 🆕 What's New in v2.7.0
- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
- **Multilingual intent detection** — routing scoring in 30+ languages
- **Request deduplication** — prevent duplicate API calls via content hash
- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _Proksi API universal Anda — satu titik akhir, 36+ penyedia, tanpa waktu henti
---
### 🆕 What's New in v2.7.0
- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
- **Multilingual intent detection** — routing scoring in 30+ languages
- **Request deduplication** — prevent duplicate API calls via content hash
- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -13,6 +13,16 @@ _आपका सार्वभौमिक एपीआई प्रॉक्
---
### 🆕 What's New in v2.7.0
- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
- **Multilingual intent detection** — routing scoring in 30+ languages
- **Request deduplication** — prevent duplicate API calls via content hash
- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _Il tuo proxy API universale — un endpoint, 36+ provider, zero downtime._
---
### 🆕 Novità in v2.7.0
- **RouterStrategy estensibile** — strategie per regole, costo e latenza
- **Rilevamento intento multilingue** — scoring di routing in 30+ lingue
- **Deduplicazione richieste** — evita chiamate duplicate tramite hash del contenuto
- **Nuovi provider:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Prezzi aggiornati:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _ユニバーサル API プロキシ — 1 つのエンドポイント、36 以
---
### 🆕 v2.7.0 の新機能
- **プラガブル RouterStrategy** — ルール・コスト・レイテンシ戦略をサポート
- **多言語インテント検出** — 30以上の言語でルーティングスコアリング
- **リクエスト重複排除** — コンテンツハッシュで重複 API 呼び出しを防止
- **新しいプロバイダー:** Grok-4 Fast (xAI)、GLM-5 / Z.AI、MiniMax M2.5、Kimi K2.5
- **価格更新:** Grok-4 Fast $0.20/$0.50/M、GLM-5 $0.50/M、MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _범용 API 프록시 — 하나의 엔드포인트, 36개 이상의 공급자,
---
### 🆕 v2.7.0 새로운 기능
- **플러그형 RouterStrategy** — 규칙, 비용, 지연 전략 지원
- **다국어 의도 감지** — 30개 이상 언어로 라우팅 스코어링
- **요청 중복 제거** — 콘텐츠 해시로 중복 API 호출 방지
- **새 공급자:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **가격 업데이트:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _Proksi API universal anda — satu titik akhir, 36+ pembekal, masa henti sifar.
---
### 🆕 What's New in v2.7.0
- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
- **Multilingual intent detection** — routing scoring in 30+ languages
- **Request deduplication** — prevent duplicate API calls via content hash
- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _Uw universele API-proxy: één eindpunt, meer dan 36 providers, geen downtime._
---
### 🆕 What's New in v2.7.0
- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
- **Multilingual intent detection** — routing scoring in 30+ languages
- **Request deduplication** — prevent duplicate API calls via content hash
- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _Din universelle API-proxy ett endepunkt, 36+ leverandører, null nedetid._
---
### 🆕 What's New in v2.7.0
- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
- **Multilingual intent detection** — routing scoring in 30+ languages
- **Request deduplication** — prevent duplicate API calls via content hash
- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _Iyong unibersal na API proxy — isang endpoint, 36+ provider, zero downtime._
---
### 🆕 What's New in v2.7.0
- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
- **Multilingual intent detection** — routing scoring in 30+ languages
- **Request deduplication** — prevent duplicate API calls via content hash
- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _Twój uniwersalny serwer proxy API — jeden punkt końcowy, ponad 36 dostawcó
---
### 🆕 What's New in v2.7.0
- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
- **Multilingual intent detection** — routing scoring in 30+ languages
- **Request deduplication** — prevent duplicate API calls via content hash
- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _Seu proxy de API universal — um endpoint, 36+ provedores, zero tempo de inati
---
### 🆕 Novidades na v2.7.0
- **RouterStrategy plugável** — estratégias de regras, custo e latência
- **Detecção de intenção multilíngue** — scoring de roteamento em 30+ idiomas
- **Deduplicação de requisições** — evita chamadas duplicadas por hash de conteúdo
- **Novos provedores:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Preços atualizados:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _Seu proxy de API universal — um endpoint, mais de 36 provedores, tempo de ina
---
### 🆕 Novidades na v2.7.0
- **RouterStrategy extensível** — estratégias de regras, custo e latência
- **Deteção de intenção multilíngue** — scoring de encaminhamento em 30+ idiomas
- **Deduplicação de pedidos** — evita chamadas duplicadas por hash de conteúdo
- **Novos fornecedores:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Preços atualizados:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _Proxy-ul dvs. universal API - un punct final, peste 36 de furnizori, zero timpi
---
### 🆕 What's New in v2.7.0
- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
- **Multilingual intent detection** — routing scoring in 30+ languages
- **Request deduplication** — prevent duplicate API calls via content hash
- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _Ваш универсальный API-прокси — одна точка до
---
### 🆕 Новое в v2.7.0
- **Подключаемая RouterStrategy** — стратегии по правилам, стоимости и задержке
- **Многоязычное распознавание намерений** — маршрутизация на 30+ языках
- **Дедупликация запросов** — устранение дублей по хэшу содержимого
- **Новые провайдеры:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Обновлённые цены:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _Váš univerzálny proxy server API jeden koncový bod, 36+ poskytovateľov
---
### 🆕 What's New in v2.7.0
- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
- **Multilingual intent detection** — routing scoring in 30+ languages
- **Request deduplication** — prevent duplicate API calls via content hash
- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _Din universella API-proxy — en slutpunkt, 36+ leverantörer, noll driftstopp.
---
### 🆕 What's New in v2.7.0
- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
- **Multilingual intent detection** — routing scoring in 30+ languages
- **Request deduplication** — prevent duplicate API calls via content hash
- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _พร็อกซี API สากลของคุณ — จุดสิ้
---
### 🆕 What's New in v2.7.0
- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
- **Multilingual intent detection** — routing scoring in 30+ languages
- **Request deduplication** — prevent duplicate API calls via content hash
- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _Ваш універсальний API-проксі — одна кінцева
---
### 🆕 What's New in v2.7.0
- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
- **Multilingual intent detection** — routing scoring in 30+ languages
- **Request deduplication** — prevent duplicate API calls via content hash
- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _Proxy API phổ quát của bạn — một điểm cuối, hơn 36 nhà cung c
---
### 🆕 What's New in v2.7.0
- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
- **Multilingual intent detection** — routing scoring in 30+ languages
- **Request deduplication** — prevent duplicate API calls via content hash
- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _您的通用 API 代理 — 一个端点,36+ 提供商,零停机时间。_
---
### 🆕 v2.7.0 新功能
- **可插拔 RouterStrategy** — 支持规则、成本和延迟策略
- **多语言意图检测** — 支持 30+ 语言的路由评分
- **请求去重** — 基于内容哈希避免重复 API 调用
- **新增提供商:** Grok-4 Fast (xAI)、GLM-5 / Z.AI、MiniMax M2.5、Kimi K2.5
- **价格更新:** Grok-4 Fast $0.20/$0.50/MGLM-5 $0.50/MMiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+1 -1
View File
@@ -1,7 +1,7 @@
openapi: 3.1.0
info:
title: OmniRoute API
version: 2.6.7
version: 2.7.2
description: |
OmniRoute is a local-first AI API proxy router. It provides an OpenAI-compatible
endpoint that routes requests to multiple AI providers with load balancing,
+50 -3
View File
@@ -11,6 +11,7 @@
export interface RegistryModel {
id: string;
name: string;
toolCalling?: boolean;
targetFormat?: string;
unsupportedParams?: readonly string[];
}
@@ -114,6 +115,7 @@ export const REGISTRY: Record<string, RegistryEntry> = {
},
models: [
{ id: "claude-opus-4-6", name: "Claude Opus 4.6" },
{ id: "claude-sonnet-4-6", name: "Claude 4.6 Sonnet" },
{ id: "claude-opus-4-5-20251101", name: "Claude 4.5 Opus" },
{ id: "claude-sonnet-4-5-20250929", name: "Claude 4.5 Sonnet" },
{ id: "claude-haiku-4-5-20251001", name: "Claude 4.5 Haiku" },
@@ -139,6 +141,9 @@ export const REGISTRY: Record<string, RegistryEntry> = {
clientSecretDefault: "",
},
models: [
{ id: "gemini-3.1-pro", name: "Gemini 3.1 Pro" },
{ id: "gemini-3-1-pro", name: "Gemini 3.1 Pro (Alt ID)" },
{ id: "gemini-3.1-pro-preview", name: "Gemini 3.1 Pro Preview" },
{ id: "gemini-2.5-pro", name: "Gemini 2.5 Pro" },
{ id: "gemini-2.5-flash", name: "Gemini 2.5 Flash" },
{ id: "gemini-2.5-flash-lite", name: "Gemini 2.5 Flash Lite" },
@@ -168,6 +173,9 @@ export const REGISTRY: Record<string, RegistryEntry> = {
clientSecretDefault: "",
},
models: [
{ id: "gemini-3.1-pro", name: "Gemini 3.1 Pro" },
{ id: "gemini-3-1-pro", name: "Gemini 3.1 Pro (Alt ID)" },
{ id: "gemini-3.1-pro-preview", name: "Gemini 3.1 Pro Preview" },
{ id: "gemini-2.5-pro", name: "Gemini 2.5 Pro" },
{ id: "gemini-2.5-flash", name: "Gemini 2.5 Flash" },
{ id: "gemini-2.5-flash-lite", name: "Gemini 2.5 Flash Lite" },
@@ -460,8 +468,13 @@ export const REGISTRY: Record<string, RegistryEntry> = {
"Anthropic-Version": "2023-06-01",
},
models: [
{ id: "claude-haiku-4.5", name: "Claude Haiku 4.5" },
{ id: "claude-sonnet-4-20250514", name: "Claude Sonnet 4" },
{ id: "claude-sonnet-4-6-20251031", name: "Claude Sonnet 4.6 (Dated)" },
{ id: "claude-sonnet-4.6", name: "Claude Sonnet 4.6" },
{ id: "claude-opus-4-20250514", name: "Claude Opus 4" },
{ id: "claude-opus-4-6-20251031", name: "Claude Opus 4.6 (Dated)" },
{ id: "claude-opus-4.6", name: "Claude Opus 4.6" },
{ id: "claude-3-5-sonnet-20241022", name: "Claude 3.5 Sonnet" },
],
},
@@ -495,6 +508,8 @@ export const REGISTRY: Record<string, RegistryEntry> = {
"Anthropic-Beta": "claude-code-20250219,interleaved-thinking-2025-05-14",
},
models: [
{ id: "glm-5", name: "GLM 5" },
{ id: "glm-5-turbo", name: "GLM 5 Turbo" },
{ id: "glm-4.7-flash", name: "GLM 4.7 Flash" },
{ id: "glm-4.7", name: "GLM 4.7" },
{ id: "glm-4.6v", name: "GLM 4.6V (Vision)" },
@@ -506,6 +521,25 @@ export const REGISTRY: Record<string, RegistryEntry> = {
],
},
zai: {
id: "zai",
alias: "zai",
format: "claude",
executor: "default",
baseUrl: "https://api.z.ai/api/anthropic/v1/messages",
urlSuffix: "?beta=true",
authType: "apikey",
authHeader: "x-api-key",
headers: {
"Anthropic-Version": "2023-06-01",
"Anthropic-Beta": "claude-code-20250219,interleaved-thinking-2025-05-14",
},
models: [
{ id: "glm-5", name: "GLM 5" },
{ id: "glm-5-turbo", name: "GLM 5 Turbo" },
],
},
kimi: {
id: "kimi",
alias: "kimi",
@@ -637,7 +671,11 @@ export const REGISTRY: Record<string, RegistryEntry> = {
"Anthropic-Version": "2023-06-01",
"Anthropic-Beta": "claude-code-20250219,interleaved-thinking-2025-05-14",
},
models: [{ id: "MiniMax-M2.1", name: "MiniMax M2.1" }],
models: [
{ id: "minimax-m2.5", name: "MiniMax M2.5" },
{ id: "MiniMax-M2.5", name: "MiniMax M2.5 (Legacy Alias)" },
{ id: "MiniMax-M2.1", name: "MiniMax M2.1" },
],
},
"minimax-cn": {
@@ -655,6 +693,8 @@ export const REGISTRY: Record<string, RegistryEntry> = {
},
models: [
// Keep parity with minimax to ensure model discovery works for minimax-cn connections.
{ id: "minimax-m2.5", name: "MiniMax M2.5" },
{ id: "MiniMax-M2.5", name: "MiniMax M2.5 (Legacy Alias)" },
{ id: "MiniMax-M2.1", name: "MiniMax M2.1" },
],
},
@@ -717,10 +757,14 @@ export const REGISTRY: Record<string, RegistryEntry> = {
authType: "apikey",
authHeader: "bearer",
models: [
{ id: "grok-4", name: "Grok 4" },
{ id: "grok-4-fast-non-reasoning", name: "Grok 4 Fast" },
{ id: "grok-4-fast-reasoning", name: "Grok 4 Fast Reasoning" },
{ id: "grok-code-fast-1", name: "Grok Code Fast" },
{ id: "grok-4-1-fast-non-reasoning", name: "Grok 4.1 Fast" },
{ id: "grok-4-1-fast-reasoning", name: "Grok 4.1 Fast Reasoning" },
{ id: "grok-4-0709", name: "Grok 4 (0709)" },
{ id: "grok-4", name: "Grok 4" },
{ id: "grok-3", name: "Grok 3" },
{ id: "grok-3-mini", name: "Grok 3 Mini" },
],
},
@@ -849,7 +893,10 @@ export const REGISTRY: Record<string, RegistryEntry> = {
authType: "apikey",
authHeader: "bearer",
models: [
{ id: "gpt-oss-120b", name: "GPT OSS 120B", toolCalling: false },
{ id: "openai/gpt-oss-120b", name: "GPT OSS 120B (OpenAI Prefix)", toolCalling: false },
{ id: "meta/llama-3.3-70b-instruct", name: "Llama 3.3 70B" },
{ id: "nvidia/llama-3.3-70b-instruct", name: "Llama 3.3 70B (NVIDIA Prefix)" },
{ id: "meta/llama-4-maverick-17b-128e-instruct", name: "Llama 4 Maverick" },
{ id: "moonshotai/kimi-k2.5", name: "Kimi K2.5" },
{ id: "z-ai/glm4.7", name: "GLM 4.7" },
+155
View File
@@ -0,0 +1,155 @@
/**
* Search Provider Registry
*
* Defines providers that support the /v1/search endpoint.
* Unlike LLM/embedding providers, search providers don't have "models"
* a provider IS the model (Serper = Google SERP, Brave = Brave index).
*
* API keys are stored in the same provider credentials system,
* keyed by provider ID (e.g. "serper-search", "brave-search").
* perplexity-search reuses credentials from the "perplexity" chat provider.
*/
export interface SearchProviderConfig {
id: string;
name: string;
baseUrl: string;
method: "GET" | "POST";
authType: "apikey";
authHeader: string;
costPerQuery: number;
freeMonthlyQuota: number;
searchTypes: string[];
defaultMaxResults: number;
maxMaxResults: number;
timeoutMs: number;
cacheTTLMs: number;
}
export const SEARCH_PROVIDERS: Record<string, SearchProviderConfig> = {
"serper-search": {
id: "serper-search",
name: "Serper Search",
baseUrl: "https://google.serper.dev",
method: "POST",
authType: "apikey",
authHeader: "x-api-key",
costPerQuery: 0.001,
freeMonthlyQuota: 2500,
searchTypes: ["web", "news"],
defaultMaxResults: 5,
maxMaxResults: 100,
timeoutMs: 10_000,
cacheTTLMs: 5 * 60 * 1000,
},
"brave-search": {
id: "brave-search",
name: "Brave Search",
baseUrl: "https://api.search.brave.com/res/v1",
method: "GET",
authType: "apikey",
authHeader: "x-subscription-token",
costPerQuery: 0.005,
freeMonthlyQuota: 1000,
searchTypes: ["web", "news"],
defaultMaxResults: 5,
maxMaxResults: 20,
timeoutMs: 10_000,
cacheTTLMs: 5 * 60 * 1000,
},
"perplexity-search": {
id: "perplexity-search",
name: "Perplexity Search",
baseUrl: "https://api.perplexity.ai/search",
method: "POST",
authType: "apikey",
authHeader: "bearer",
costPerQuery: 0.005,
freeMonthlyQuota: 0,
searchTypes: ["web"],
defaultMaxResults: 5,
maxMaxResults: 20,
timeoutMs: 10_000,
cacheTTLMs: 5 * 60 * 1000,
},
"exa-search": {
id: "exa-search",
name: "Exa Search",
baseUrl: "https://api.exa.ai/search",
method: "POST",
authType: "apikey",
authHeader: "x-api-key",
costPerQuery: 0.007,
freeMonthlyQuota: 1000,
searchTypes: ["web", "news"],
defaultMaxResults: 5,
maxMaxResults: 100,
timeoutMs: 10_000,
cacheTTLMs: 5 * 60 * 1000,
},
"tavily-search": {
id: "tavily-search",
name: "Tavily Search",
baseUrl: "https://api.tavily.com/search",
method: "POST",
authType: "apikey",
authHeader: "bearer",
costPerQuery: 0.008,
freeMonthlyQuota: 1000,
searchTypes: ["web", "news"],
defaultMaxResults: 5,
maxMaxResults: 20,
timeoutMs: 10_000,
cacheTTLMs: 5 * 60 * 1000,
},
};
/**
* Credential fallback mapping search providers that can reuse credentials
* from a related provider (e.g., perplexity-search uses the same API key as perplexity chat).
*/
export const SEARCH_CREDENTIAL_FALLBACKS: Record<string, string> = {
"perplexity-search": "perplexity",
};
/**
* Get search provider config by ID
*/
export function getSearchProvider(providerId: string): SearchProviderConfig | null {
return SEARCH_PROVIDERS[providerId] || null;
}
/**
* Get all search providers as a flat list
*/
export function getAllSearchProviders(): Array<{
id: string;
name: string;
searchTypes: string[];
}> {
return Object.values(SEARCH_PROVIDERS).map((p) => ({
id: p.id,
name: p.name,
searchTypes: p.searchTypes,
}));
}
/**
* Select the cheapest available provider.
* If an explicit provider is given, validate and return it.
* Otherwise, return the cheapest by costPerQuery.
*/
export function selectProvider(explicitProvider?: string): SearchProviderConfig | null {
if (explicitProvider) {
return SEARCH_PROVIDERS[explicitProvider] || null;
}
const providers = Object.values(SEARCH_PROVIDERS);
if (providers.length === 0) return null;
return providers.reduce((cheapest, p) => (p.costPerQuery < cheapest.costPerQuery ? p : cheapest));
}
+183 -28
View File
@@ -42,6 +42,12 @@ import {
import { getIdempotencyKey, checkIdempotency, saveIdempotency } from "@/lib/idempotencyLayer";
import { createProgressTransform, wantsProgress } from "../utils/progressTracker.ts";
import { isModelUnavailableError, getNextFamilyFallback } from "../services/modelFamilyFallback.ts";
import { computeRequestHash, deduplicate, shouldDeduplicate } from "../services/requestDedup.ts";
import {
shouldUseFallback,
isFallbackDecision,
EMERGENCY_FALLBACK_CONFIG,
} from "../services/emergencyFallback.ts";
export function shouldUseNativeCodexPassthrough({
provider,
@@ -89,6 +95,22 @@ export async function handleChatCore({
}) {
const { provider, model, extendedContext } = modelInfo;
const startTime = Date.now();
const persistFailureUsage = (statusCode: number, errorCode?: string | null) => {
saveRequestUsage({
provider: provider || "unknown",
model: model || "unknown",
tokens: { input: 0, output: 0, cacheRead: 0, cacheCreation: 0, reasoning: 0 },
status: String(statusCode),
success: false,
latencyMs: Date.now() - startTime,
timeToFirstTokenMs: 0,
errorCode: errorCode || String(statusCode),
timestamp: new Date().toISOString(),
connectionId: connectionId || undefined,
apiKeyId: apiKeyInfo?.id || undefined,
apiKeyName: apiKeyInfo?.name || undefined,
}).catch(() => {});
};
// ── Phase 9.2: Idempotency check ──
const idempotencyKey = getIdempotencyKey(clientRawRequest?.headers);
@@ -193,7 +215,7 @@ export async function handleChatCore({
} else if (isClaudePassthrough) {
// Claude-to-Claude passthrough: forward body completely untouched.
// No translation, no field stripping, no thinking normalization.
// We are just a gateway -- do not interfere with the request in any way.
// We are just a gateway -- do not interfere with the request in the slightest.
translatedBody = { ...body };
log?.debug?.("FORMAT", "claude->claude passthrough -- forwarding untouched");
} else {
@@ -246,8 +268,44 @@ export async function handleChatCore({
if (Array.isArray(translatedBody.messages)) {
for (const msg of translatedBody.messages) {
if (Array.isArray(msg.content)) {
msg.content = msg.content.filter((block: Record<string, unknown>) =>
block.type !== "text" || (typeof block.text === "string" && block.text.length > 0)
msg.content = msg.content.filter(
(block: Record<string, unknown>) =>
block.type !== "text" || (typeof block.text === "string" && block.text.length > 0)
);
}
}
}
// ── #409: Normalize unsupported content part types ──
// Cursor and other clients send {type:"file"} when attaching .md or other files.
// Providers (Copilot, OpenAI) only accept "text" and "image_url" in content arrays.
// Convert: file → text (extract content), drop unrecognized types with a warning.
if (Array.isArray(translatedBody.messages)) {
for (const msg of translatedBody.messages) {
if (msg.role === "user" && Array.isArray(msg.content)) {
msg.content = (msg.content as Record<string, unknown>[]).flatMap(
(block: Record<string, unknown>) => {
if (block.type === "text" || block.type === "image_url" || block.type === "image") {
return [block];
}
// file / document → extract text content
if (block.type === "file" || block.type === "document") {
const fileContent =
(block.file as Record<string, unknown>)?.content ??
(block.file as Record<string, unknown>)?.text ??
block.content ??
block.text;
const fileName =
(block.file as Record<string, unknown>)?.name ?? block.name ?? "attachment";
if (typeof fileContent === "string" && fileContent.length > 0) {
return [{ type: "text", text: `[${fileName}]\n${fileContent}` }];
}
return [];
}
// Unknown types: drop silently
log?.debug?.("CONTENT", `Dropped unsupported content part type="${block.type}"`);
return [];
}
);
}
}
@@ -328,6 +386,57 @@ export async function handleChatCore({
// Get executor for this provider
const executor = getExecutor(provider);
// Create stream controller for disconnect detection
const streamController = createStreamController({ onDisconnect, log, provider, model });
const dedupRequestBody = { ...translatedBody, model: `${provider}/${model}` };
const dedupEnabled = shouldDeduplicate(dedupRequestBody);
const dedupHash = dedupEnabled ? computeRequestHash(dedupRequestBody) : null;
const executeProviderRequest = async (modelToCall = model, allowDedup = false) => {
const execute = async () => {
const bodyToSend =
translatedBody.model === modelToCall
? translatedBody
: { ...translatedBody, model: modelToCall };
const rawResult = await withRateLimit(provider, connectionId, modelToCall, () =>
executor.execute({
model: modelToCall,
body: bodyToSend,
stream,
credentials,
signal: streamController.signal,
log,
extendedContext,
})
);
if (stream) return rawResult;
// Non-stream responses need cloning for shared dedup consumers.
const status = rawResult.response.status;
const statusText = rawResult.response.statusText;
const headers = Array.from(rawResult.response.headers.entries());
const payload = await rawResult.response.text();
return {
...rawResult,
response: new Response(payload, { status, statusText, headers }),
};
};
if (allowDedup && dedupEnabled && dedupHash) {
const dedupResult = await deduplicate(dedupHash, execute);
if (dedupResult.wasDeduplicated) {
log?.debug?.("DEDUP", `Joined in-flight request hash=${dedupHash}`);
}
return dedupResult.result;
}
return execute();
};
// Track pending request
trackPendingRequest(model, provider, connectionId, true);
@@ -345,9 +454,6 @@ export async function handleChatCore({
0;
log?.debug?.("REQUEST", `${provider.toUpperCase()} | ${model} | ${msgCount} msgs`);
// Create stream controller for disconnect detection
const streamController = createStreamController({ onDisconnect, log, provider, model });
// Execute request using executor (handles URL building, headers, fallback, transform)
let providerResponse;
let providerUrl;
@@ -355,17 +461,7 @@ export async function handleChatCore({
let finalBody;
try {
const result = await withRateLimit(provider, connectionId, model, () =>
executor.execute({
model,
body: translatedBody,
stream,
credentials,
signal: streamController.signal,
log,
extendedContext,
})
);
const result = await executeProviderRequest(model, true);
providerResponse = result.response;
providerUrl = result.url;
@@ -412,6 +508,7 @@ export async function handleChatCore({
streamController.handleError(error);
return createErrorResult(499, "Request aborted");
}
persistFailureUsage(HTTP_STATUS.BAD_GATEWAY, error?.name || "upstream_error");
const errMsg = formatProviderError(error, provider, model, HTTP_STATUS.BAD_GATEWAY);
console.log(`${COLORS.red}[ERROR] ${errMsg}${COLORS.reset}`);
return createErrorResult(HTTP_STATUS.BAD_GATEWAY, errMsg);
@@ -521,17 +618,7 @@ export async function handleChatCore({
log?.info?.("MODEL_FALLBACK", `${model} unavailable (${statusCode}) → trying ${nextModel}`);
// Re-execute with the fallback model
try {
const fallbackResult = await withRateLimit(provider, connectionId, nextModel, () =>
executor.execute({
model: nextModel,
body: translatedBody,
stream,
credentials,
signal: streamController.signal,
log,
extendedContext,
})
);
const fallbackResult = await executeProviderRequest(nextModel, false);
if (fallbackResult.response.ok) {
providerResponse = fallbackResult.response;
providerUrl = fallbackResult.url;
@@ -543,18 +630,79 @@ export async function handleChatCore({
// We fall through by NOT returning here
} else {
// Fallback also failed — return original error
persistFailureUsage(statusCode, "model_unavailable");
return createErrorResult(statusCode, errMsg, retryAfterMs);
}
} catch {
persistFailureUsage(statusCode, "model_unavailable");
return createErrorResult(statusCode, errMsg, retryAfterMs);
}
} else {
persistFailureUsage(statusCode, "model_unavailable");
return createErrorResult(statusCode, errMsg, retryAfterMs);
}
} else {
persistFailureUsage(statusCode, `upstream_${statusCode}`);
return createErrorResult(statusCode, errMsg, retryAfterMs);
}
// ── End T5 ───────────────────────────────────────────────────────────────
// ── Emergency Fallback (ClawRouter Feature #09/017) ────────────────────
// When a non-streaming request fails with a budget-related error (402 or
// budget keywords), redirect to nvidia/gpt-oss-120b ($0.00/M) before
// returning the error to the combo router. This gives one last free-tier
// attempt so the user's session stays alive.
const requestHasTools = Array.isArray(translatedBody.tools) && translatedBody.tools.length > 0;
if (!stream) {
const fbDecision = shouldUseFallback(
statusCode,
message,
requestHasTools,
EMERGENCY_FALLBACK_CONFIG
);
if (isFallbackDecision(fbDecision)) {
log?.info?.("EMERGENCY_FALLBACK", fbDecision.reason);
try {
// Build a minimal fallback request using the original body but with
// the NVIDIA free-tier model and max_tokens capped to avoid overuse.
const fbExecutor = getExecutor(fbDecision.provider);
const fbResult = await fbExecutor.execute({
model: fbDecision.model,
body: {
...translatedBody,
model: fbDecision.model,
max_tokens: Math.min(
typeof translatedBody.max_tokens === "number"
? translatedBody.max_tokens
: fbDecision.maxOutputTokens,
fbDecision.maxOutputTokens
),
},
stream: false,
credentials: credentials,
signal: streamController.signal,
log,
extendedContext,
});
if (fbResult.response.ok) {
providerResponse = fbResult.response;
log?.info?.(
"EMERGENCY_FALLBACK",
`Serving ${fbDecision.provider}/${fbDecision.model} as budget fallback for ${provider}/${model}`
);
// Fall through to non-streaming handler — providerResponse is now OK
} else {
log?.warn?.(
"EMERGENCY_FALLBACK",
`Emergency fallback also failed (${fbResult.response.status})`
);
}
} catch (fbErr) {
log?.warn?.("EMERGENCY_FALLBACK", `Emergency fallback error: ${fbErr?.message}`);
}
}
}
// ── End Emergency Fallback ────────────────────────────────────────────
}
// Non-streaming response
@@ -580,6 +728,7 @@ export async function handleChatCore({
connectionId,
status: `FAILED ${HTTP_STATUS.BAD_GATEWAY}`,
}).catch(() => {});
persistFailureUsage(HTTP_STATUS.BAD_GATEWAY, "invalid_sse_payload");
return createErrorResult(
HTTP_STATUS.BAD_GATEWAY,
"Invalid SSE response for non-streaming request"
@@ -597,6 +746,7 @@ export async function handleChatCore({
connectionId,
status: `FAILED ${HTTP_STATUS.BAD_GATEWAY}`,
}).catch(() => {});
persistFailureUsage(HTTP_STATUS.BAD_GATEWAY, "invalid_json_payload");
return createErrorResult(HTTP_STATUS.BAD_GATEWAY, "Invalid JSON response from provider");
}
}
@@ -639,6 +789,11 @@ export async function handleChatCore({
provider: provider || "unknown",
model: model || "unknown",
tokens: usage,
status: "200",
success: true,
latencyMs: Date.now() - startTime,
timeToFirstTokenMs: Date.now() - startTime,
errorCode: null,
timestamp: new Date().toISOString(),
connectionId: connectionId || undefined,
apiKeyId: apiKeyInfo?.id || undefined,
+664
View File
@@ -0,0 +1,664 @@
/**
* Search Handler
*
* Handles POST /v1/search requests.
* Routes to 5 search providers with automatic failover:
* serper-search, brave-search, perplexity-search, exa-search, tavily-search
*
* Request format:
* {
* "query": "search query",
* "provider": "serper-search" | "brave-search" | ... // optional, auto-selects cheapest
* "max_results": 5,
* "search_type": "web" | "news"
* }
*/
import { getSearchProvider, type SearchProviderConfig } from "../config/searchRegistry.ts";
import { saveCallLog } from "@/lib/usageDb";
// ── Types ────────────────────────────────────────────────────────────────
export interface SearchResult {
title: string;
url: string;
display_url?: string;
snippet: string;
position: number;
score: number | null;
published_at: string | null;
favicon_url: string | null;
content: { format: string; text: string; length: number } | null;
metadata: {
author: string | null;
language: string | null;
source_type: string | null;
image_url: string | null;
} | null;
citation: {
provider: string;
retrieved_at: string;
rank: number;
};
provider_raw: Record<string, unknown> | null;
}
export interface SearchResponse {
provider: string;
query: string;
results: SearchResult[];
answer: { source: string; text: string | null; model: string | null } | null;
usage: { queries_used: number; search_cost_usd: number; llm_tokens?: number };
metrics: {
response_time_ms: number;
upstream_latency_ms: number;
gateway_latency_ms?: number;
total_results_available: number | null;
};
errors: Array<{ provider: string; code: string; message: string }>;
}
interface SearchHandlerResult {
success: boolean;
status?: number;
error?: string;
data?: SearchResponse;
}
interface SearchHandlerOptions {
query: string;
provider: string;
maxResults: number;
searchType: string;
country?: string;
language?: string;
timeRange?: string;
offset?: number;
domainFilter?: string[];
contentOptions?: { snippet?: boolean; full_page?: boolean; format?: string; max_characters?: number };
strictFilters?: boolean;
providerOptions?: Record<string, unknown>;
credentials: Record<string, any>;
alternateProvider?: string;
alternateCredentials?: Record<string, any> | null;
log?: any;
}
// ── Constants ────────────────────────────────────────────────────────────
const GLOBAL_TIMEOUT_MS = 15_000;
// Non-retriable HTTP status codes — fail immediately, don't try alternate
const NON_RETRIABLE = new Set([400, 401, 403, 404]);
// ── Input Sanitization ──────────────────────────────────────────────────
// Control characters that should never appear in search queries
const CONTROL_CHAR_RE = /[\x00-\x08\x0B\x0C\x0E-\x1F\x7F]/;
function sanitizeQuery(query: string): { clean: string; error?: string } {
if (CONTROL_CHAR_RE.test(query)) {
return { clean: "", error: "Query contains invalid control characters" };
}
const clean = query.normalize("NFKC").trim().replace(/\s+/g, " ");
if (clean.length === 0) {
return { clean: "", error: "Query is empty after normalization" };
}
return { clean };
}
// ── Response Normalizers ────────────────────────────────────────────────
function makeResult(
providerId: string,
item: {
title?: string;
url?: string;
snippet?: string;
score?: number;
published_at?: string;
favicon_url?: string;
author?: string;
source_type?: string;
image_url?: string;
full_text?: string;
text_format?: string;
},
idx: number,
now: string
): SearchResult {
const url = item.url || "";
return {
title: item.title || "",
url,
display_url: url ? url.replace(/^https?:\/\/(www\.)?/, "").split("?")[0] : undefined,
snippet: item.snippet || "",
position: idx + 1,
score: typeof item.score === "number" ? Math.min(1, Math.max(0, item.score)) : null,
published_at: item.published_at || null,
favicon_url: item.favicon_url || null,
content: item.full_text
? { format: item.text_format || "text", text: item.full_text, length: item.full_text.length }
: null,
metadata: {
author: item.author || null,
language: null,
source_type: item.source_type || null,
image_url: item.image_url || null,
},
citation: { provider: providerId, retrieved_at: now, rank: idx + 1 },
provider_raw: null,
};
}
function normalizeSerperResponse(
data: any,
_query: string,
searchType: string
): { results: SearchResult[]; totalResults: number | null } {
const now = new Date().toISOString();
const items = searchType === "news" ? data.news : data.organic;
if (!Array.isArray(items)) return { results: [], totalResults: null };
const results = items.map((item: any, idx: number) =>
makeResult(
"serper-search",
{
title: item.title,
url: item.link,
snippet: item.snippet || item.description,
published_at: item.date,
},
idx,
now
)
);
return {
results,
totalResults:
typeof data.searchParameters?.totalResults === "number"
? data.searchParameters.totalResults
: null,
};
}
function normalizeBraveResponse(
data: any,
_query: string,
searchType: string
): { results: SearchResult[]; totalResults: number | null } {
const now = new Date().toISOString();
const container = searchType === "news" ? data.news : data.web;
const items = container?.results;
if (!Array.isArray(items)) return { results: [], totalResults: null };
const results = items.map((item: any, idx: number) =>
makeResult(
"brave-search",
{
title: item.title,
url: item.url,
snippet: item.description,
published_at: item.page_age || item.age,
favicon_url: item.meta_url?.favicon || item.favicon,
},
idx,
now
)
);
return { results, totalResults: container?.totalCount ?? null };
}
// ── Helpers ─────────────────────────────────────────────────────────────
function parseDomainFilter(domainFilter?: string[]): {
includes: string[];
excludes: string[];
} {
if (!domainFilter?.length) return { includes: [], excludes: [] };
const includes = domainFilter.filter((d) => !d.startsWith("-"));
const excludes = domainFilter.filter((d) => d.startsWith("-")).map((d) => d.slice(1));
return { includes, excludes };
}
// ── Provider Request Builders ───────────────────────────────────────────
interface SearchRequestParams {
query: string;
searchType: string;
maxResults: number;
token: string;
country?: string;
language?: string;
domainFilter?: string[];
}
function buildSerperRequest(
config: SearchProviderConfig,
params: SearchRequestParams
): { url: string; init: RequestInit } {
const endpoint = params.searchType === "news" ? "/news" : "/search";
const body: Record<string, unknown> = { q: params.query, num: params.maxResults };
if (params.country) body.gl = params.country.toLowerCase();
if (params.language) body.hl = params.language;
return {
url: `${config.baseUrl}${endpoint}`,
init: {
method: "POST",
headers: { "Content-Type": "application/json", "X-API-Key": params.token },
body: JSON.stringify(body),
},
};
}
function buildBraveRequest(
config: SearchProviderConfig,
params: SearchRequestParams
): { url: string; init: RequestInit } {
const endpoint = params.searchType === "news" ? "/news/search" : "/web/search";
const qp = new URLSearchParams({ q: params.query, count: String(params.maxResults) });
if (params.country) qp.set("country", params.country);
if (params.language) qp.set("search_lang", params.language);
return {
url: `${config.baseUrl}${endpoint}?${qp}`,
init: {
method: "GET",
headers: { Accept: "application/json", "X-Subscription-Token": params.token },
},
};
}
function buildPerplexityRequest(
config: SearchProviderConfig,
params: SearchRequestParams
): { url: string; init: RequestInit } {
const body: Record<string, unknown> = { query: params.query, max_results: params.maxResults };
if (params.country) body.country = params.country;
if (params.language) body.search_language_filter = [params.language];
if (params.domainFilter?.length) body.search_domain_filter = params.domainFilter;
return {
url: config.baseUrl,
init: {
method: "POST",
headers: { "Content-Type": "application/json", Authorization: `Bearer ${params.token}` },
body: JSON.stringify(body),
},
};
}
function buildExaRequest(
config: SearchProviderConfig,
params: SearchRequestParams
): { url: string; init: RequestInit } {
const { includes, excludes } = parseDomainFilter(params.domainFilter);
const body: Record<string, unknown> = {
query: params.query,
numResults: params.maxResults,
type: "auto",
text: true,
highlights: true,
};
if (includes.length) body.includeDomains = includes;
if (excludes.length) body.excludeDomains = excludes;
if (params.searchType === "news") body.category = "news";
return {
url: config.baseUrl,
init: {
method: "POST",
headers: { "Content-Type": "application/json", "x-api-key": params.token },
body: JSON.stringify(body),
},
};
}
function buildTavilyRequest(
config: SearchProviderConfig,
params: SearchRequestParams
): { url: string; init: RequestInit } {
const { includes, excludes } = parseDomainFilter(params.domainFilter);
const body: Record<string, unknown> = {
query: params.query,
max_results: params.maxResults,
topic: params.searchType === "news" ? "news" : "general",
};
if (includes.length) body.include_domains = includes;
if (excludes.length) body.exclude_domains = excludes;
if (params.country) body.country = params.country;
return {
url: config.baseUrl,
init: {
method: "POST",
headers: { "Content-Type": "application/json", Authorization: `Bearer ${params.token}` },
body: JSON.stringify(body),
},
};
}
function buildRequest(
config: SearchProviderConfig,
params: SearchRequestParams
): { url: string; init: RequestInit } {
if (config.id === "serper-search") return buildSerperRequest(config, params);
if (config.id === "brave-search") return buildBraveRequest(config, params);
if (config.id === "perplexity-search") return buildPerplexityRequest(config, params);
if (config.id === "exa-search") return buildExaRequest(config, params);
if (config.id === "tavily-search") return buildTavilyRequest(config, params);
// Fallback for future providers: POST with bearer auth
return {
url: config.baseUrl,
init: {
method: config.method,
headers: { "Content-Type": "application/json", Authorization: `Bearer ${params.token}` },
body: JSON.stringify({
query: params.query,
max_results: params.maxResults,
search_type: params.searchType,
}),
},
};
}
function normalizePerplexityResponse(
data: any,
_query: string,
_searchType: string
): { results: SearchResult[]; totalResults: number | null } {
const now = new Date().toISOString();
const items = data.results;
if (!Array.isArray(items)) return { results: [], totalResults: null };
const results = items.map((item: any, idx: number) =>
makeResult(
"perplexity-search",
{
title: item.title,
url: item.url,
snippet: item.snippet,
published_at: item.date || item.last_updated,
},
idx,
now
)
);
return { results, totalResults: results.length };
}
function normalizeExaResponse(
data: any,
_query: string,
_searchType: string
): { results: SearchResult[]; totalResults: number | null } {
const now = new Date().toISOString();
const items = data.results;
if (!Array.isArray(items)) return { results: [], totalResults: null };
const results = items.map((item: any, idx: number) =>
makeResult(
"exa-search",
{
title: item.title,
url: item.url,
snippet: item.highlights?.[0] || item.text?.slice(0, 300) || "",
score: item.score,
published_at: item.publishedDate,
favicon_url: item.favicon,
author: item.author,
image_url: item.image,
full_text: item.text,
text_format: "text",
},
idx,
now
)
);
return { results, totalResults: results.length };
}
function normalizeTavilyResponse(
data: any,
_query: string,
_searchType: string
): { results: SearchResult[]; totalResults: number | null } {
const now = new Date().toISOString();
const items = data.results;
if (!Array.isArray(items)) return { results: [], totalResults: null };
const results = items.map((item: any, idx: number) =>
makeResult(
"tavily-search",
{
title: item.title,
url: item.url,
snippet: item.content || "",
score: item.score,
published_at: item.published_date,
full_text: item.raw_content,
text_format: "text",
},
idx,
now
)
);
return { results, totalResults: results.length };
}
function normalizeResponse(
providerId: string,
data: any,
query: string,
searchType: string
): { results: SearchResult[]; totalResults: number | null } {
if (providerId === "serper-search") return normalizeSerperResponse(data, query, searchType);
if (providerId === "brave-search") return normalizeBraveResponse(data, query, searchType);
if (providerId === "perplexity-search")
return normalizePerplexityResponse(data, query, searchType);
if (providerId === "exa-search") return normalizeExaResponse(data, query, searchType);
if (providerId === "tavily-search") return normalizeTavilyResponse(data, query, searchType);
return { results: [], totalResults: null };
}
// ── Main Handler ────────────────────────────────────────────────────────
export async function handleSearch(options: SearchHandlerOptions): Promise<SearchHandlerResult> {
const {
query,
provider: providerId,
maxResults,
searchType,
country,
language,
domainFilter,
credentials,
alternateProvider,
alternateCredentials,
log,
} = options;
const startTime = Date.now();
// 1. Sanitize input
const { clean: cleanQuery, error: sanitizeError } = sanitizeQuery(query);
if (sanitizeError) {
return { success: false, status: 400, error: sanitizeError };
}
// 2. Use resolved provider from route (no re-resolution)
const primaryConfig = getSearchProvider(providerId);
if (!primaryConfig) {
return {
success: false,
status: 400,
error: `Unknown search provider: ${providerId}`,
};
}
// 3. Get alternate config for failover (pre-resolved by route)
const alternateConfig = alternateProvider ? getSearchProvider(alternateProvider) : null;
const requestParams = {
query: cleanQuery,
searchType,
maxResults,
country,
language,
domainFilter,
};
// 4. Try primary provider
const result = await tryProvider(primaryConfig, requestParams, credentials, startTime, log);
if (result.success) return result;
// 5. Failover to alternate (only for retriable errors and auto-select mode)
if (
alternateConfig &&
alternateCredentials &&
!NON_RETRIABLE.has(result.status || 0) &&
Date.now() - startTime < GLOBAL_TIMEOUT_MS
) {
if (log) {
log.warn(
"SEARCH",
`${primaryConfig.id} failed (${result.status}), trying ${alternateConfig.id}`
);
}
const fallbackResult = await tryProvider(
alternateConfig,
requestParams,
alternateCredentials,
startTime,
log
);
if (fallbackResult.success) return fallbackResult;
}
return result;
}
async function tryProvider(
config: SearchProviderConfig,
params: Omit<SearchRequestParams, "token">,
credentials: Record<string, any>,
globalStartTime: number,
log?: any
): Promise<SearchHandlerResult> {
const startTime = Date.now();
const token = credentials.apiKey || credentials.accessToken;
if (!token) {
return {
success: false,
status: 401,
error: `No credentials for search provider: ${config.id}`,
};
}
const { query, searchType, maxResults } = params;
const { url, init } = buildRequest(config, { ...params, token });
// Timeout: min of provider timeout and remaining global timeout
const remainingGlobal = GLOBAL_TIMEOUT_MS - (Date.now() - globalStartTime);
const timeout = Math.min(config.timeoutMs, Math.max(remainingGlobal, 1000));
const controller = new AbortController();
const timer = setTimeout(() => controller.abort(), timeout);
if (log) {
log.info("SEARCH", `${config.id} | query: "${query.slice(0, 80)}" | type: ${searchType}`);
}
try {
const response = await fetch(url, { ...init, signal: controller.signal });
clearTimeout(timer);
if (!response.ok) {
const errorText = await response.text();
if (log) {
log.error("SEARCH", `${config.id} error ${response.status}: ${errorText.slice(0, 200)}`);
}
saveCallLog({
method: config.method,
path: "/v1/search",
status: response.status,
model: config.id,
provider: config.id,
duration: Date.now() - startTime,
requestType: "search",
error: errorText.slice(0, 500),
requestBody: {
query: query.slice(0, 200),
search_type: searchType,
max_results: maxResults,
},
}).catch(() => { /* non-critical — logging must not block search response */ });
return {
success: false,
status: response.status,
error: `Search provider ${config.id} returned ${response.status}`,
};
}
const data = await response.json();
const { results, totalResults } = normalizeResponse(config.id, data, query, searchType);
const duration = Date.now() - startTime;
saveCallLog({
method: config.method,
path: "/v1/search",
status: 200,
model: config.id,
provider: config.id,
duration,
requestType: "search",
tokens: { prompt_tokens: 0, completion_tokens: 0 },
requestBody: { query: query.slice(0, 200), search_type: searchType, max_results: maxResults },
responseBody: { results_count: results.length, cached: false },
}).catch(() => { /* non-critical — logging must not block search response */ });
return {
success: true,
data: {
provider: config.id,
query,
results,
answer: null,
usage: { queries_used: 1, search_cost_usd: config.costPerQuery },
metrics: {
response_time_ms: duration,
upstream_latency_ms: duration,
total_results_available: totalResults,
},
errors: [],
},
};
} catch (err: any) {
clearTimeout(timer);
const isTimeout = err.name === "AbortError";
if (log) {
log.error("SEARCH", `${config.id} ${isTimeout ? "timeout" : "fetch error"}: ${err.message}`);
}
saveCallLog({
method: config.method,
path: "/v1/search",
status: isTimeout ? 504 : 502,
model: config.id,
provider: config.id,
duration: Date.now() - startTime,
requestType: "search",
error: err.message,
requestBody: { query: query.slice(0, 200), search_type: searchType, max_results: maxResults },
}).catch(() => { /* non-critical — logging must not block search response */ });
return {
success: false,
status: isTimeout ? 504 : 502,
error: `Search provider ${isTimeout ? "timeout" : "error"}: ${err.message}`,
};
}
}
@@ -0,0 +1,48 @@
import { describe, it, expect } from "vitest";
import {
MCP_TOOLS,
MCP_TOOL_MAP,
setRoutingStrategyInput,
setRoutingStrategyTool,
} from "../schemas/tools.ts";
describe("omniroute_set_routing_strategy MCP tool schema", () => {
it("should be registered in MCP_TOOLS", () => {
const tool = MCP_TOOLS.find((t) => t.name === "omniroute_set_routing_strategy");
expect(tool).toBeDefined();
expect(tool?.phase).toBe(2);
});
it("should be available in MCP_TOOL_MAP", () => {
expect(MCP_TOOL_MAP["omniroute_set_routing_strategy"]).toBeDefined();
});
it("should require write:combos scope", () => {
expect(setRoutingStrategyTool.scopes).toContain("write:combos");
});
it("should validate a standard strategy payload", () => {
const result = setRoutingStrategyInput.safeParse({
comboId: "my-combo",
strategy: "cost-optimized",
});
expect(result.success).toBe(true);
});
it("should validate auto strategy with autoRoutingStrategy", () => {
const result = setRoutingStrategyInput.safeParse({
comboId: "my-combo",
strategy: "auto",
autoRoutingStrategy: "latency",
});
expect(result.success).toBe(true);
});
it("should reject unknown strategy", () => {
const result = setRoutingStrategyInput.safeParse({
comboId: "my-combo",
strategy: "unknown-strategy",
});
expect(result.success).toBe(false);
});
});
+55 -7
View File
@@ -107,6 +107,7 @@ export const listCombosOutput = z.object({
"priority",
"weighted",
"round-robin",
"strict-random",
"random",
"least-used",
"cost-optimized",
@@ -470,7 +471,53 @@ export const setBudgetGuardTool: McpToolDefinition<
sourceEndpoints: ["/api/usage/budget"],
};
// --- Tool 11: omniroute_set_resilience_profile ---
// --- Tool 11: omniroute_set_routing_strategy ---
export const setRoutingStrategyInput = z.object({
comboId: z.string().describe("Combo ID or name to update"),
strategy: z
.enum([
"priority",
"weighted",
"round-robin",
"strict-random",
"random",
"least-used",
"cost-optimized",
"auto",
])
.describe("Routing strategy to apply"),
autoRoutingStrategy: z
.enum(["rules", "cost", "eco", "latency", "fast"])
.optional()
.describe("Optional strategy used by auto mode (only used when strategy='auto')"),
});
export const setRoutingStrategyOutput = z.object({
success: z.boolean(),
combo: z.object({
id: z.string(),
name: z.string(),
strategy: z.string(),
autoRoutingStrategy: z.string().nullable(),
}),
});
export const setRoutingStrategyTool: McpToolDefinition<
typeof setRoutingStrategyInput,
typeof setRoutingStrategyOutput
> = {
name: "omniroute_set_routing_strategy",
description:
"Updates a combo routing strategy (priority/weighted/auto/etc.) at runtime. Supports selecting the sub-strategy used by auto mode (rules/cost/latency).",
inputSchema: setRoutingStrategyInput,
outputSchema: setRoutingStrategyOutput,
scopes: ["write:combos"],
auditLevel: "full",
phase: 2,
sourceEndpoints: ["/api/combos", "/api/combos/{id}"],
};
// --- Tool 12: omniroute_set_resilience_profile ---
export const setResilienceProfileInput = z.object({
profile: z
.enum(["aggressive", "balanced", "conservative"])
@@ -502,7 +549,7 @@ export const setResilienceProfileTool: McpToolDefinition<
sourceEndpoints: ["/api/resilience"],
};
// --- Tool 12: omniroute_test_combo ---
// --- Tool 13: omniroute_test_combo ---
export const testComboInput = z.object({
comboId: z.string().describe("ID of the combo to test"),
testPrompt: z.string().max(500).describe("Short test prompt (max 500 chars)"),
@@ -540,7 +587,7 @@ export const testComboTool: McpToolDefinition<typeof testComboInput, typeof test
sourceEndpoints: ["/api/combos/test", "/v1/chat/completions"],
};
// --- Tool 13: omniroute_get_provider_metrics ---
// --- Tool 14: omniroute_get_provider_metrics ---
export const getProviderMetricsInput = z.object({
provider: z.string().describe("Provider name (e.g., 'claude', 'gemini-cli', 'codex')"),
});
@@ -583,7 +630,7 @@ export const getProviderMetricsTool: McpToolDefinition<
sourceEndpoints: ["/api/provider-metrics", "/api/resilience"],
};
// --- Tool 14: omniroute_best_combo_for_task ---
// --- Tool 15: omniroute_best_combo_for_task ---
export const bestComboForTaskInput = z.object({
taskType: z
.enum(["coding", "review", "planning", "analysis", "debugging", "documentation"])
@@ -628,7 +675,7 @@ export const bestComboForTaskTool: McpToolDefinition<
sourceEndpoints: ["/api/combos", "/api/combos/metrics", "/api/monitoring/health"],
};
// --- Tool 15: omniroute_explain_route ---
// --- Tool 16: omniroute_explain_route ---
export const explainRouteInput = z.object({
requestId: z.string().describe("Request ID from the X-Request-Id header"),
});
@@ -674,7 +721,7 @@ export const explainRouteTool: McpToolDefinition<
sourceEndpoints: [],
};
// --- Tool 16: omniroute_get_session_snapshot ---
// --- Tool 17: omniroute_get_session_snapshot ---
export const getSessionSnapshotInput = z.object({}).describe("No parameters required");
export const getSessionSnapshotOutput = z.object({
@@ -723,7 +770,7 @@ export const getSessionSnapshotTool: McpToolDefinition<
sourceEndpoints: ["/api/usage/analytics", "/api/telemetry/summary"],
};
// --- Tool 17: omniroute_sync_pricing ---
// --- Tool 18: omniroute_sync_pricing ---
export const syncPricingInput = z.object({
sources: z
.array(z.string())
@@ -775,6 +822,7 @@ export const MCP_TOOLS = [
// Phase 2: Advanced
simulateRouteTool,
setBudgetGuardTool,
setRoutingStrategyTool,
setResilienceProfileTool,
testComboTool,
getProviderMetricsTool,
+14
View File
@@ -25,6 +25,7 @@ import {
listModelsCatalogInput,
simulateRouteInput,
setBudgetGuardInput,
setRoutingStrategyInput,
setResilienceProfileInput,
testComboInput,
getProviderMetricsInput,
@@ -45,6 +46,7 @@ import {
import {
handleSimulateRoute,
handleSetBudgetGuard,
handleSetRoutingStrategy,
handleSetResilienceProfile,
handleTestCombo,
handleGetProviderMetrics,
@@ -593,6 +595,18 @@ export function createMcpServer(): McpServer {
)
);
server.registerTool(
"omniroute_set_routing_strategy",
{
description:
"Updates combo routing strategy at runtime (priority/weighted/round-robin/auto/etc.)",
inputSchema: setRoutingStrategyInput,
},
withScopeEnforcement("omniroute_set_routing_strategy", (args) =>
handleSetRoutingStrategy(setRoutingStrategyInput.parse(args))
)
);
server.registerTool(
"omniroute_set_resilience_profile",
{
+111 -7
View File
@@ -1,16 +1,18 @@
/**
* OmniRoute MCP Advanced Tools 8 intelligence tools that differentiate
* OmniRoute MCP Advanced Tools 10 intelligence tools that differentiate
* OmniRoute from all other AI gateways.
*
* Tools:
* 1. omniroute_simulate_route Dry-run routing simulation
* 2. omniroute_set_budget_guard Session budget with degrade/block/alert
* 3. omniroute_set_resilience_profile Circuit breaker/retry profiles
* 4. omniroute_test_combo Live test each provider in a combo
* 5. omniroute_get_provider_metrics Detailed per-provider metrics
* 6. omniroute_best_combo_for_task AI-powered combo recommendation
* 7. omniroute_explain_route Post-hoc routing decision explainer
* 8. omniroute_get_session_snapshot Full session state snapshot
* 3. omniroute_set_routing_strategy Runtime strategy switch for combos
* 4. omniroute_set_resilience_profile Circuit breaker/retry profiles
* 5. omniroute_test_combo Live test each provider in a combo
* 6. omniroute_get_provider_metrics Detailed per-provider metrics
* 7. omniroute_best_combo_for_task AI-powered combo recommendation
* 8. omniroute_explain_route Post-hoc routing decision explainer
* 9. omniroute_get_session_snapshot Full session state snapshot
* 10. omniroute_sync_pricing Sync provider pricing from external source
*/
import { logToolCall } from "../audit.ts";
@@ -335,6 +337,108 @@ export async function handleSetBudgetGuard(args: {
}
}
export async function handleSetRoutingStrategy(args: {
comboId: string;
strategy:
| "priority"
| "weighted"
| "round-robin"
| "strict-random"
| "random"
| "least-used"
| "cost-optimized"
| "auto";
autoRoutingStrategy?: "rules" | "cost" | "eco" | "latency" | "fast";
}) {
const start = Date.now();
try {
const combos = normalizeCombosResponse(await apiFetch("/api/combos"));
const combo = combos.find(
(comboEntry) =>
toString(comboEntry.id) === args.comboId || toString(comboEntry.name) === args.comboId
);
if (!combo) {
const msg = `Combo '${args.comboId}' not found`;
await logToolCall(
"omniroute_set_routing_strategy",
args,
null,
Date.now() - start,
false,
msg
);
return { content: [{ type: "text" as const, text: `Error: ${msg}` }], isError: true };
}
const comboId = toString(combo.id);
if (!comboId) {
const msg = "Matched combo has no id";
await logToolCall(
"omniroute_set_routing_strategy",
args,
null,
Date.now() - start,
false,
msg
);
return { content: [{ type: "text" as const, text: `Error: ${msg}` }], isError: true };
}
const comboData = toRecord(combo.data);
const currentConfig = toRecord(
Object.keys(toRecord(combo.config)).length > 0 ? combo.config : comboData.config
);
let nextConfig: JsonRecord | undefined = undefined;
if (args.strategy === "auto" && args.autoRoutingStrategy) {
const currentAutoConfig = toRecord(currentConfig.auto);
nextConfig = {
...currentConfig,
auto: {
...currentAutoConfig,
routingStrategy: args.autoRoutingStrategy,
},
};
}
const payload: JsonRecord = { strategy: args.strategy };
if (nextConfig && Object.keys(nextConfig).length > 0) {
payload.config = nextConfig;
}
const updatedCombo = toRecord(
await apiFetch(`/api/combos/${encodeURIComponent(comboId)}`, {
method: "PUT",
body: JSON.stringify(payload),
})
);
const updatedConfig = toRecord(updatedCombo.config);
const resolvedAutoStrategy =
toString(toRecord(updatedConfig.auto).routingStrategy) ||
(args.strategy === "auto" ? (args.autoRoutingStrategy ?? "rules") : "");
const result = {
success: true,
combo: {
id: toString(updatedCombo.id, comboId),
name: toString(updatedCombo.name, toString(combo.name, comboId)),
strategy: toString(updatedCombo.strategy, args.strategy),
autoRoutingStrategy:
toString(updatedCombo.strategy, args.strategy) === "auto" ? resolvedAutoStrategy : null,
},
};
await logToolCall("omniroute_set_routing_strategy", args, result, Date.now() - start, true);
return { content: [{ type: "text" as const, text: JSON.stringify(result, null, 2) }] };
} catch (err) {
const msg = err instanceof Error ? err.message : String(err);
await logToolCall("omniroute_set_routing_strategy", args, null, Date.now() - start, false, msg);
return { content: [{ type: "text" as const, text: `Error: ${msg}` }], isError: true };
}
}
export async function handleSetResilienceProfile(args: {
profile: "aggressive" | "balanced" | "conservative";
}) {
+36 -3
View File
@@ -20,6 +20,7 @@ import {
import { getTaskFitness } from "./taskFitness";
import { getModePack } from "./modePacks";
import { getSelfHealingManager } from "./selfHealing";
import { classifyPromptIntent } from "../intentClassifier";
export interface AutoComboConfig {
id: string;
@@ -30,6 +31,8 @@ export interface AutoComboConfig {
modePack?: string;
budgetCap?: number; // max cost per request in USD
explorationRate: number; // 0.05 = 5% exploratory
/** If set, RouterStrategy name to use for selection ('rules' | 'cost' | 'latency') */
routerStrategy?: string;
}
export interface SelectionResult {
@@ -43,14 +46,44 @@ export interface SelectionResult {
/**
* Select the best provider from an auto-combo pool.
*
* @param config - AutoCombo configuration
* @param candidates - Provider candidates to score
* @param taskType - Task type hint. When "default" or omitted, the engine will attempt
* to infer the intent from `promptMessages` using multilingual classification.
* @param promptMessages - Optional raw messages for intent classification
*/
export function selectProvider(
config: AutoComboConfig,
candidates: ProviderCandidate[],
taskType: string = "default"
taskType: string = "default",
promptMessages?: Array<{ role: string; content: unknown }>
): SelectionResult {
const healer = getSelfHealingManager();
// ── Intent classification (ClawRouter Feature #10/11) ────────────────────
// When taskType is generic ('default'), attempt to classify the prompt intent
// using the multilingual intentClassifier for better task fitness scoring.
let effectiveTaskType = taskType;
if ((taskType === "default" || taskType === "") && promptMessages?.length) {
// Extract text from last user message for classification
const lastUserMsg = [...promptMessages].reverse().find((m) => m.role === "user");
if (lastUserMsg) {
const text =
typeof lastUserMsg.content === "string"
? lastUserMsg.content
: Array.isArray(lastUserMsg.content)
? (lastUserMsg.content as Array<{ type: string; text?: string }>)
.filter((b) => b.type === "text")
.map((b) => b.text || "")
.join(" ")
: "";
if (text.length > 10) {
const intent = classifyPromptIntent(text);
effectiveTaskType = intent; // 'code' | 'reasoning' | 'simple' | 'medium'
}
}
}
// Resolve weights from mode pack or config
let weights = config.weights;
if (config.modePack) {
@@ -80,8 +113,8 @@ export function selectProvider(
excluded.length = 0;
}
// Score all providers
const scored = scorePool(pool, taskType, weights, getTaskFitness);
// Score all providers (using classified intent if available)
const scored = scorePool(pool, effectiveTaskType, weights, getTaskFitness);
// Apply self-healing re-evaluation with actual scores
const finalCandidates = scored.filter((s) => {
@@ -0,0 +1,159 @@
/**
* RouterStrategy Pluggable Routing Strategy System
*
* Inspired by ClawRouter commit 14c83c258 "refactor: extract routing into pluggable RouterStrategy system".
* Provides a RouterStrategy interface and two built-in implementations:
* - RulesStrategy (default): wraps the existing 6-factor scoring engine
* - CostStrategy: always picks cheapest available model
*/
import type { ProviderCandidate, ScoredProvider } from "./scoring.ts";
import { scorePool } from "./scoring.ts";
import { getTaskFitness } from "./taskFitness.ts";
export interface RoutingContext {
taskType: string;
requestHasTools?: boolean;
requestHasVision?: boolean;
estimatedInputTokens?: number;
}
export interface RoutingDecision {
provider: string;
model: string;
strategy: string;
reason: string;
candidatesConsidered: number;
finalScore: number;
}
export interface RouterStrategy {
readonly name: string;
readonly description: string;
select(pool: ProviderCandidate[], context: RoutingContext): RoutingDecision;
}
// ── RulesStrategy: wraps 6-factor scoring engine ────────────────────────────
class RulesStrategyImpl implements RouterStrategy {
readonly name = "rules";
readonly description =
"6-factor weighted scoring: quota, health, cost, latency, taskFit, stability";
select(pool: ProviderCandidate[], context: RoutingContext): RoutingDecision {
const eligible = pool.filter((c) => c.circuitBreakerState !== "OPEN");
const ranked: ScoredProvider[] = scorePool(
eligible.length > 0 ? eligible : pool,
context.taskType,
undefined,
getTaskFitness
);
const best = ranked[0];
if (!best) throw new Error("[RulesStrategy] No candidates to score");
return {
provider: best.provider,
model: best.model,
strategy: this.name,
reason: `RulesStrategy: score=${best.score.toFixed(3)} (quota=${best.factors.quota.toFixed(2)}, health=${best.factors.health.toFixed(2)}, cost=${best.factors.costInv.toFixed(2)}, taskFit=${best.factors.taskFit.toFixed(2)})`,
candidatesConsidered: ranked.length,
finalScore: best.score,
};
}
}
// ── CostStrategy: always picks cheapest healthy provider ─────────────────────
class CostStrategyImpl implements RouterStrategy {
readonly name = "cost";
readonly description = "Always selects cheapest available provider (by costPer1MTokens)";
select(pool: ProviderCandidate[], context: RoutingContext): RoutingDecision {
const healthy = pool.filter((c) => c.circuitBreakerState !== "OPEN");
const candidates = healthy.length > 0 ? healthy : pool;
const sorted = [...candidates].sort((a, b) => a.costPer1MTokens - b.costPer1MTokens);
const best = sorted[0];
if (!best) throw new Error("[CostStrategy] No candidates available");
return {
provider: best.provider,
model: best.model,
strategy: this.name,
reason: `CostStrategy: cheapest at $${best.costPer1MTokens.toFixed(3)}/1M tokens`,
candidatesConsidered: candidates.length,
finalScore: best.costPer1MTokens === 0 ? 1.0 : 1 / best.costPer1MTokens,
};
}
}
// ── LatencyStrategy: prioritize low latency + reliability ───────────────────
class LatencyStrategyImpl implements RouterStrategy {
readonly name = "latency";
readonly description = "Prioritizes lowest p95 latency with reliability weighting";
select(pool: ProviderCandidate[], context: RoutingContext): RoutingDecision {
const healthy = pool.filter((c) => c.circuitBreakerState !== "OPEN");
const candidates = healthy.length > 0 ? healthy : pool;
const sorted = [...candidates].sort((a, b) => {
const aPenalty = a.errorRate * 1000;
const bPenalty = b.errorRate * 1000;
return a.p95LatencyMs + aPenalty - (b.p95LatencyMs + bPenalty);
});
const best = sorted[0];
if (!best) throw new Error("[LatencyStrategy] No candidates available");
const latencyScore = best.p95LatencyMs > 0 ? Math.max(0.001, 10_000 / best.p95LatencyMs) : 1;
const reliability = Math.max(0, 1 - best.errorRate);
const finalScore = latencyScore * 0.7 + reliability * 0.3;
return {
provider: best.provider,
model: best.model,
strategy: this.name,
reason: `LatencyStrategy: p95=${best.p95LatencyMs}ms, errorRate=${(best.errorRate * 100).toFixed(2)}%`,
candidatesConsidered: candidates.length,
finalScore,
};
}
}
// ── Registry ──────────────────────────────────────────────────────────────────
const strategyRegistry = new Map<string, RouterStrategy>();
const rulesStrategy = new RulesStrategyImpl();
const costStrategy = new CostStrategyImpl();
const latencyStrategy = new LatencyStrategyImpl();
strategyRegistry.set("rules", rulesStrategy);
strategyRegistry.set("cost", costStrategy);
strategyRegistry.set("eco", costStrategy); // alias
strategyRegistry.set("latency", latencyStrategy);
strategyRegistry.set("fast", latencyStrategy); // alias
export function getStrategy(name: string): RouterStrategy {
const strategy = strategyRegistry.get(name);
if (!strategy) {
console.warn(`[RouterStrategy] Strategy '${name}' not found, falling back to 'rules'`);
return rulesStrategy;
}
return strategy;
}
export function registerStrategy(name: string, strategy: RouterStrategy): void {
if (strategyRegistry.has(name)) {
console.warn(`[RouterStrategy] Overwriting strategy '${name}'`);
}
strategyRegistry.set(name, strategy);
}
export function listStrategies(): Array<{ name: string; description: string }> {
return [...strategyRegistry.entries()].map(([name, s]) => ({ name, description: s.description }));
}
export function selectWithStrategy(
pool: ProviderCandidate[],
context: RoutingContext,
strategyName = "rules"
): RoutingDecision {
return getStrategy(strategyName).select(pool, context);
}
+2 -1
View File
@@ -74,7 +74,8 @@ export function calculateScore(factors: ScoringFactors, weights: ScoringWeights)
weights.costInv * factors.costInv +
weights.latencyInv * factors.latencyInv +
weights.taskFit * factors.taskFit +
weights.stability * factors.stability
weights.stability * factors.stability +
weights.tierPriority * factors.tierPriority
);
}
@@ -24,10 +24,23 @@ const FITNESS_TABLE: Record<string, Record<string, number>> = {
"deepseek-coder": 0.9,
"deepseek-v3": 0.85,
"deepseek-r1": 0.88,
"deepseek-chat": 0.84, // DeepSeek V3.2 Chat — strong code performance
"deepseek-v3.2": 0.86, // Explicit V3.2 alias
qwen: 0.78,
llama: 0.72,
mistral: 0.75,
mixtral: 0.77,
// Grok-4 fast — good code, ultra-low latency (1143ms P50)
"grok-4-fast": 0.8,
"grok-4": 0.82,
"grok-3": 0.8,
// Kimi K2.5 — agentic with tool calling, good at code tasks
"kimi-k2": 0.82,
// GLM-5 — Z.AI model with 128k output
"glm-5": 0.78,
// MiniMax M2.5 — reasoning support helps complex code
"minimax-m2.5": 0.75,
"minimax-m2": 0.72,
},
review: {
"claude-sonnet": 0.92,
@@ -58,10 +71,15 @@ const FITNESS_TABLE: Record<string, Record<string, number>> = {
"claude-sonnet": 0.92,
"gemini-2.5-pro": 0.95,
"gemini-pro": 0.88,
"gemini-3.1-pro": 0.95, // Gemini 3.1 Pro — 1M context, ideal for long analysis
"gpt-4o": 0.85,
o1: 0.9,
o3: 0.93,
"deepseek-r1": 0.88,
"deepseek-chat": 0.8,
"kimi-k2": 0.82, // Kimi K2.5 agentic — good for analysis
"glm-5": 0.78, // GLM-5 with 128k output for long analysis
"minimax-m2.5": 0.76,
},
debugging: {
"claude-sonnet": 0.93,
@@ -87,8 +105,17 @@ const FITNESS_TABLE: Record<string, Record<string, number>> = {
"claude-opus": 0.85,
"gpt-4o": 0.85,
"gemini-pro": 0.8,
"gemini-3.1-pro": 0.85,
"deepseek-v3": 0.75,
"deepseek-chat": 0.74,
"gemini-flash": 0.72,
// New models from ClawRouter analysis (2026-03-17):
"grok-4-fast": 0.72, // ultra-fast, suitable for all tasks
"grok-4": 0.74,
"grok-3": 0.73,
"kimi-k2": 0.76, // agentic multi-step tasks
"glm-5": 0.7,
"minimax-m2.5": 0.7,
},
};
+371 -4
View File
@@ -5,18 +5,37 @@
import { checkFallbackError, formatRetryAfter, getProviderProfile } from "./accountFallback.ts";
import { unavailableResponse } from "../utils/error.ts";
import { recordComboRequest, getComboMetrics } from "./comboMetrics.ts";
import { recordComboIntent, recordComboRequest, getComboMetrics } from "./comboMetrics.ts";
import { resolveComboConfig, getDefaultComboConfig } from "./comboConfig.ts";
import * as semaphore from "./rateLimitSemaphore.ts";
import { getCircuitBreaker } from "../../src/shared/utils/circuitBreaker";
import { fisherYatesShuffle, getNextFromDeck } from "../../src/shared/utils/shuffleDeck";
import { parseModel } from "./model.ts";
import { applyComboAgentMiddleware, injectModelTag } from "./comboAgentMiddleware.ts";
import { classifyWithConfig, DEFAULT_INTENT_CONFIG } from "./intentClassifier.ts";
import { selectProvider as selectAutoProvider } from "./autoCombo/engine.ts";
import { selectWithStrategy } from "./autoCombo/routerStrategy.ts";
import { DEFAULT_WEIGHTS, scorePool } from "./autoCombo/scoring.ts";
import { supportsToolCalling } from "./modelCapabilities.ts";
// Status codes that should mark semaphore + record circuit breaker failures
const TRANSIENT_FOR_BREAKER = [429, 502, 503, 504];
const MAX_COMBO_DEPTH = 3;
// Bootstrap defaults from ClawRouter benchmark (used when no local latency history exists yet)
const DEFAULT_MODEL_P95_MS = {
"grok-4-fast-non-reasoning": 1143,
"grok-4-1-fast-non-reasoning": 1244,
"gemini-2.5-flash": 1238,
"kimi-k2.5": 1646,
"gpt-4o-mini": 2764,
"claude-sonnet-4.6": 4000,
"claude-opus-4.6": 6000,
"deepseek-chat": 2000,
};
const MIN_HISTORY_SAMPLES = 10;
// In-memory atomic counter per combo for round-robin distribution
// Resets on server restart (by design — no stale state)
const rrCounters = new Map();
@@ -201,6 +220,193 @@ function sortModelsByUsage(models, comboName) {
return withUsage.map((e) => e.modelStr);
}
function toTextContent(content) {
if (typeof content === "string") return content;
if (!Array.isArray(content)) return "";
return content
.map((part) => {
if (!part || typeof part !== "object") return "";
if (typeof part.text === "string") return part.text;
return "";
})
.join("\n");
}
function extractPromptForIntent(body) {
if (!body || typeof body !== "object") return "";
const fromMessages = Array.isArray(body.messages)
? [...body.messages].reverse().find((m) => m && typeof m === "object" && m.role === "user")
: null;
if (fromMessages) return toTextContent(fromMessages.content);
if (typeof body.input === "string") return body.input;
if (Array.isArray(body.input)) {
const text = body.input
.map((item) => {
if (!item || typeof item !== "object") return "";
if (typeof item.content === "string") return item.content;
if (typeof item.text === "string") return item.text;
return "";
})
.filter(Boolean)
.join("\n");
if (text) return text;
}
if (typeof body.prompt === "string") return body.prompt;
return "";
}
function mapIntentToTaskType(intent) {
switch (intent) {
case "code":
return "coding";
case "reasoning":
return "analysis";
case "simple":
return "default";
case "medium":
default:
return "default";
}
}
function toStringArray(input) {
if (Array.isArray(input)) {
return input.map((v) => (typeof v === "string" ? v.trim() : "")).filter(Boolean);
}
if (typeof input === "string") {
return input
.split(",")
.map((v) => v.trim())
.filter(Boolean);
}
return [];
}
function getIntentConfig(settings, combo) {
const comboIntentConfig =
combo?.autoConfig?.intentConfig ||
combo?.config?.auto?.intentConfig ||
combo?.config?.intentConfig ||
{};
return {
...DEFAULT_INTENT_CONFIG,
...comboIntentConfig,
...(typeof settings?.intentDetectionEnabled === "boolean"
? { enabled: settings.intentDetectionEnabled }
: {}),
...(Number.isFinite(Number(settings?.intentSimpleMaxWords))
? { simpleMaxWords: Number(settings.intentSimpleMaxWords) }
: {}),
...(toStringArray(settings?.intentExtraCodeKeywords).length > 0
? { extraCodeKeywords: toStringArray(settings.intentExtraCodeKeywords) }
: {}),
...(toStringArray(settings?.intentExtraReasoningKeywords).length > 0
? { extraReasoningKeywords: toStringArray(settings.intentExtraReasoningKeywords) }
: {}),
...(toStringArray(settings?.intentExtraSimpleKeywords).length > 0
? { extraSimpleKeywords: toStringArray(settings.intentExtraSimpleKeywords) }
: {}),
};
}
function getBootstrapLatencyMs(modelId) {
const normalized = String(modelId || "").toLowerCase();
return DEFAULT_MODEL_P95_MS[normalized] ?? 1500;
}
async function buildAutoCandidates(modelStrings, comboName) {
const metrics = getComboMetrics(comboName);
const { getPricingForModel } = await import("../../src/lib/localDb");
let historicalLatencyStats = {};
try {
const { getModelLatencyStats } = await import("../../src/lib/usageDb");
historicalLatencyStats = await getModelLatencyStats({
windowHours: 24,
minSamples: 3,
maxRows: 10000,
});
} catch {
// keep empty stats — auto-combo will use runtime + bootstrap signals
}
const candidates = await Promise.all(
modelStrings.map(async (modelStr) => {
const parsed = parseModel(modelStr);
const provider = parsed.provider || parsed.providerAlias || "unknown";
const model = parsed.model || modelStr;
const historicalKey = `${provider}/${model}`;
const historicalModelMetric = historicalLatencyStats[historicalKey] || null;
const historicalTotal = Number(historicalModelMetric?.totalRequests);
const hasHistoricalSignal =
Number.isFinite(historicalTotal) && historicalTotal >= MIN_HISTORY_SAMPLES;
let costPer1MTokens = 1;
try {
const pricing = await getPricingForModel(provider, model);
const inputPrice = Number(pricing?.input);
if (Number.isFinite(inputPrice) && inputPrice >= 0) {
costPer1MTokens = inputPrice;
}
} catch {
// keep default cost
}
const modelMetric = metrics?.byModel?.[modelStr] || null;
const avgLatency = Number(modelMetric?.avgLatencyMs);
const successRate = Number(modelMetric?.successRate);
const historicalP95Latency = Number(historicalModelMetric?.p95LatencyMs);
const historicalStdDev = Number(historicalModelMetric?.latencyStdDev);
const historicalSuccessRate = Number(historicalModelMetric?.successRate); // 0..1
const p95LatencyMs = hasHistoricalSignal
? Number.isFinite(historicalP95Latency) && historicalP95Latency > 0
? historicalP95Latency
: getBootstrapLatencyMs(model)
: Number.isFinite(avgLatency) && avgLatency > 0
? avgLatency
: getBootstrapLatencyMs(model);
const errorRate = hasHistoricalSignal
? Number.isFinite(historicalSuccessRate) &&
historicalSuccessRate >= 0 &&
historicalSuccessRate <= 1
? 1 - historicalSuccessRate
: 0.05
: Number.isFinite(successRate) && successRate >= 0 && successRate <= 100
? 1 - successRate / 100
: 0.05;
const latencyStdDev =
hasHistoricalSignal && Number.isFinite(historicalStdDev) && historicalStdDev > 0
? Math.max(10, historicalStdDev)
: Math.max(10, p95LatencyMs * 0.1);
const breakerStateRaw = getCircuitBreaker(`combo:${modelStr}`)?.getStatus?.()?.state;
const circuitBreakerState =
breakerStateRaw === "OPEN" || breakerStateRaw === "HALF_OPEN" ? breakerStateRaw : "CLOSED";
return {
provider,
model,
quotaRemaining: 100,
quotaTotal: 100,
circuitBreakerState,
costPer1MTokens,
p95LatencyMs,
latencyStdDev,
errorRate,
accountTier: "standard",
quotaResetIntervalSecs: 86400,
};
})
);
return candidates;
}
/**
* Handle combo chat with fallback
* Supports all 6 strategies: priority, weighted, round-robin, random, least-used, cost-optimized
@@ -225,12 +431,49 @@ export async function handleComboChat({
const strategy = combo.strategy || "priority";
const models = combo.models || [];
// ── Combo Agent Middleware (#399 + #401) ────────────────────────────────
// Apply system_message override, tool_filter_regex, and extract pinned model
// from context caching tag. These are all opt-in per combo config.
const { body: agentBody, pinnedModel } = applyComboAgentMiddleware(
body,
combo,
"" // provider/model not yet known — resolved per-model in loop
);
body = agentBody;
if (pinnedModel) {
log.info("COMBO", `[#401] Context caching: pinned model=${pinnedModel}`);
}
// Wrap handleSingleModel to inject context caching tag on response (#401)
const handleSingleModelWrapped = combo.context_cache_protection
? async (b, modelStr) => {
const res = await handleSingleModel(b, modelStr);
// Inject tag only on success and only for non-streaming non-binary responses
if (res.ok && !b.stream) {
try {
const json = await res.clone().json();
const msgs = Array.isArray(json?.messages) ? json.messages : [];
if (msgs.length > 0) {
const tagged = injectModelTag(msgs, modelStr);
return new Response(JSON.stringify({ ...json, messages: tagged }), {
status: res.status,
headers: res.headers,
});
}
} catch {
/* non-JSON or stream — skip tagging */
}
}
return res;
}
: handleSingleModel;
// ─────────────────────────────────────────────────────────────────────────
// Route to round-robin handler if strategy matches
if (strategy === "round-robin") {
return handleRoundRobinCombo({
body,
combo,
handleSingleModel,
handleSingleModel: handleSingleModelWrapped,
isModelAvailable,
log,
settings,
@@ -278,7 +521,131 @@ export async function handleComboChat({
}
// Apply strategy-specific ordering
if (strategy === "strict-random") {
if (strategy === "auto") {
const requestHasTools = Array.isArray(body?.tools) && body.tools.length > 0;
let eligibleModels = [...orderedModels];
if (requestHasTools) {
const filtered = eligibleModels.filter((m) => supportsToolCalling(m));
if (filtered.length > 0) {
eligibleModels = filtered;
} else {
log.warn(
"COMBO",
"Auto strategy: all candidates filtered by tool-calling policy, falling back to full pool"
);
}
}
const prompt = extractPromptForIntent(body);
const systemPrompt =
typeof combo?.system_message === "string" ? combo.system_message : undefined;
const intentConfig = getIntentConfig(settings, combo);
const intent = classifyWithConfig(prompt, intentConfig, systemPrompt);
recordComboIntent(combo.name, intent);
const taskType = mapIntentToTaskType(intent);
const autoConfigSource = combo?.autoConfig || combo?.config?.auto || combo?.config || {};
const routingStrategy =
typeof autoConfigSource.routingStrategy === "string"
? autoConfigSource.routingStrategy
: typeof autoConfigSource.strategyName === "string"
? autoConfigSource.strategyName
: "rules";
const candidatePool = Array.isArray(autoConfigSource.candidatePool)
? autoConfigSource.candidatePool
: [
...new Set(
eligibleModels.map((m) => {
const parsed = parseModel(m);
return parsed.provider || parsed.providerAlias || "unknown";
})
),
];
const weights =
autoConfigSource.weights && typeof autoConfigSource.weights === "object"
? autoConfigSource.weights
: DEFAULT_WEIGHTS;
const explorationRate = Number.isFinite(Number(autoConfigSource.explorationRate))
? Number(autoConfigSource.explorationRate)
: 0.05;
const budgetCap = Number.isFinite(Number(autoConfigSource.budgetCap))
? Number(autoConfigSource.budgetCap)
: undefined;
const modePack =
typeof autoConfigSource.modePack === "string" ? autoConfigSource.modePack : undefined;
const candidates = await buildAutoCandidates(eligibleModels, combo.name);
if (candidates.length > 0) {
let selectedProvider = null;
let selectedModel = null;
let selectionReason = "";
if (routingStrategy !== "rules") {
try {
const decision = selectWithStrategy(
candidates,
{ taskType, requestHasTools },
routingStrategy
);
selectedProvider = decision.provider;
selectedModel = decision.model;
selectionReason = decision.reason;
} catch (err) {
log.warn(
"COMBO",
`Auto strategy '${routingStrategy}' failed (${err?.message || "unknown"}), falling back to rules`
);
}
}
if (!selectedProvider || !selectedModel) {
const selection = selectAutoProvider(
{
id: combo.id || combo.name,
name: combo.name,
type: "auto",
candidatePool,
weights,
modePack,
budgetCap,
explorationRate,
},
candidates,
taskType
);
selectedProvider = selection.provider;
selectedModel = selection.model;
selectionReason = `score=${selection.score.toFixed(3)}${selection.isExploration ? " (exploration)" : ""}`;
}
const modelLookup = new Map();
for (const modelStr of eligibleModels) {
const parsed = parseModel(modelStr);
const provider = parsed.provider || parsed.providerAlias || "unknown";
const modelId = parsed.model || modelStr;
modelLookup.set(`${provider}/${modelId}`, modelStr);
}
const ranked = scorePool(candidates, taskType, weights)
.map((r) => modelLookup.get(`${r.provider}/${r.model}`) || `${r.provider}/${r.model}`)
.filter(Boolean);
const selectedModelStr =
modelLookup.get(`${selectedProvider}/${selectedModel}`) ||
`${selectedProvider}/${selectedModel}`;
orderedModels = [...new Set([selectedModelStr, ...ranked, ...eligibleModels])];
log.info(
"COMBO",
`Auto selection: ${selectedModelStr} | intent=${intent} task=${taskType} | strategy=${routingStrategy} | ${selectionReason}`
);
} else {
log.warn("COMBO", "Auto strategy has no candidates, keeping default ordering");
}
} else if (strategy === "strict-random") {
const selectedId = await getNextFromDeck(`combo:${combo.name}`, orderedModels);
// Put selected model first so the fallback loop tries it first
const rest = orderedModels.filter((m) => m !== selectedId);
@@ -348,7 +715,7 @@ export async function handleComboChat({
`Trying model ${i + 1}/${orderedModels.length}: ${modelStr}${retry > 0 ? ` (retry ${retry})` : ""}`
);
const result = await handleSingleModel(body, modelStr);
const result = await handleSingleModelWrapped(body, modelStr);
// Success — return response
if (result.ok) {
+169
View File
@@ -0,0 +1,169 @@
/**
* comboAgentMiddleware.ts Combo Agent Features
*
* Implements the "combo as agent" features from issues #399 and #401:
*
* 1. **System Message Override** (#399): If the combo defines a `system_message`,
* it is injected as the first system message, replacing any existing system message.
*
* 2. **Tool Filter Regex** (#399): If the combo defines a `tool_filter_regex`,
* only tools whose name matches the pattern are forwarded to the provider.
*
* 3. **Context Caching Protection** (#401): If the combo enables
* `context_cache_protection`, the proxy:
* a. On response: injects `<omniModel>provider/model</omniModel>` tag into
* the first assistant message content string.
* b. On request: scans the message history for the tag, and if found,
* overrides the requested model with the pinned one.
*
* All features are opt-in per combo and backward compatible with existing setups.
*/
interface ComboConfig {
system_message?: string | null;
tool_filter_regex?: string | null;
context_cache_protection?: number | boolean;
[key: string]: unknown;
}
interface Message {
role?: string;
content?: unknown;
[key: string]: unknown;
}
// ── Context Caching Tag ─────────────────────────────────────────────────────
const CACHE_TAG_PATTERN = /<omniModel>([^<]+)<\/omniModel>/;
/**
* Inject the model tag into the last assistant message (or append a new one).
* Only modifies string content does not touch array content to avoid breaking
* Claude/Gemini multi-part message formats.
*/
export function injectModelTag(messages: Message[], providerModel: string): Message[] {
// Remove any existing tag first to avoid duplication on context compaction
const cleaned = messages.map((msg) => {
if (msg.role === "assistant" && typeof msg.content === "string") {
return { ...msg, content: msg.content.replace(CACHE_TAG_PATTERN, "").trimEnd() };
}
return msg;
});
// Find last assistant message with string content
const lastAssistantIdx = cleaned.map((m) => m.role).lastIndexOf("assistant");
if (lastAssistantIdx === -1) return cleaned;
const msg = cleaned[lastAssistantIdx];
if (typeof msg.content !== "string") return cleaned;
const tagged = [...cleaned];
tagged[lastAssistantIdx] = {
...msg,
content: `${msg.content}\n<omniModel>${providerModel}</omniModel>`,
};
return tagged;
}
/**
* Scan message history for the model tag injected by a previous response.
* Returns the pinned "provider/model" string, or null if not found.
*/
export function extractPinnedModel(messages: Message[]): string | null {
// Scan from newest to oldest for efficiency
for (let i = messages.length - 1; i >= 0; i--) {
const msg = messages[i];
if (msg.role === "assistant" && typeof msg.content === "string") {
const match = CACHE_TAG_PATTERN.exec(msg.content);
if (match) return match[1];
}
}
return null;
}
// ── System Message Override ──────────────────────────────────────────────────
/**
* Replace or inject a system message at the beginning of the messages array.
* Existing system messages are removed if a combo override is set.
*/
export function applySystemMessageOverride(messages: Message[], systemMessage: string): Message[] {
// Remove all existing system messages
const filtered = messages.filter((m) => m.role !== "system");
// Inject combo system message at start
return [{ role: "system", content: systemMessage }, ...filtered];
}
// ── Tool Filter Regex ────────────────────────────────────────────────────────
/**
* Filter the tools array, keeping only tools whose name matches the regex.
* Returns the original array unchanged if pattern is null/empty.
*/
export function applyToolFilter(
tools: unknown[] | undefined,
pattern: string | null | undefined
): unknown[] | undefined {
if (!tools || !pattern) return tools;
let regex: RegExp;
try {
regex = new RegExp(pattern);
} catch {
// Invalid regex — return tools unchanged rather than crashing
console.warn(`[ComboAgent] Invalid tool_filter_regex: "${pattern}"`);
return tools;
}
return tools.filter((tool) => {
const t = tool as Record<string, unknown>;
// Support both OpenAI format ({ function: { name } }) and Anthropic ({ name })
const name = (t.function as Record<string, unknown> | undefined)?.name ?? t.name ?? "";
return regex.test(String(name));
});
}
// ── Main Middleware ──────────────────────────────────────────────────────────
/**
* Apply all combo agent features to the request body.
* Safe to call with null/undefined comboConfig returns body unchanged.
*/
export function applyComboAgentMiddleware(
body: Record<string, unknown>,
comboConfig: ComboConfig | null | undefined,
providerModel: string // "provider/model" string for context caching
): { body: Record<string, unknown>; pinnedModel: string | null } {
if (!comboConfig) return { body, pinnedModel: null };
let messages: Message[] = Array.isArray(body.messages) ? [...body.messages] : [];
let pinnedModel: string | null = null;
// 1. Context caching: check for pinned model in history
if (comboConfig.context_cache_protection) {
pinnedModel = extractPinnedModel(messages);
if (pinnedModel) {
// Model is pinned — caller should override model selection
}
}
// 2. System message override
if (comboConfig.system_message && comboConfig.system_message.trim()) {
messages = applySystemMessageOverride(messages, comboConfig.system_message);
}
// 3. Tool filter
const filteredTools = applyToolFilter(
body.tools as unknown[] | undefined,
comboConfig.tool_filter_regex
);
return {
body: {
...body,
messages,
...(filteredTools !== body.tools && { tools: filteredTools }),
},
pinnedModel,
};
}
+27
View File
@@ -21,6 +21,7 @@ interface ComboMetricsEntry {
totalLatencyMs: number;
strategy: string;
lastUsedAt: string | null;
intentCounts: Record<string, number>;
byModel: Record<string, ModelMetrics>;
}
@@ -69,6 +70,7 @@ export function recordComboRequest(
totalLatencyMs: 0,
strategy,
lastUsedAt: null,
intentCounts: {},
byModel: {},
});
}
@@ -131,6 +133,7 @@ export function getComboMetrics(comboName: string): ComboMetricsView | null {
combo.totalRequests > 0 ? Math.round((combo.totalSuccesses / combo.totalRequests) * 100) : 0,
fallbackRate:
combo.totalRequests > 0 ? Math.round((combo.totalFallbacks / combo.totalRequests) * 100) : 0,
intentCounts: { ...combo.intentCounts },
byModel: Object.fromEntries(
Object.entries(combo.byModel).map(([model, m]) => [
model,
@@ -156,6 +159,30 @@ export function getAllComboMetrics(): Record<string, ComboMetricsView | null> {
return result;
}
/**
* Record detected prompt intent for a combo (used by multilingual routing analytics).
*/
export function recordComboIntent(comboName: string, intent: string): void {
if (!metrics.has(comboName)) {
metrics.set(comboName, {
totalRequests: 0,
totalSuccesses: 0,
totalFailures: 0,
totalFallbacks: 0,
totalLatencyMs: 0,
strategy: "priority",
lastUsedAt: null,
intentCounts: {},
byModel: {},
});
}
const combo = metrics.get(comboName);
if (!combo) return;
const key = String(intent || "unknown");
combo.intentCounts[key] = (combo.intentCounts[key] || 0) + 1;
}
/**
* Reset metrics for a specific combo
*/
+103
View File
@@ -0,0 +1,103 @@
/**
* Emergency Fallback Budget Exhaustion Redirect
*
* When a request fails due to budget exhaustion (HTTP 402 or budget keywords
* in the error body), optionally redirect to a free-tier model
* (default provider/model: nvidia + openai/gpt-oss-120b at $0.00/M tokens).
*
* Inspired by ClawRouter: "gpt-oss-120b costs nothing and serves as
* automatic fallback when wallet is empty."
*/
export interface EmergencyFallbackConfig {
enabled: boolean;
provider: string;
model: string;
triggerOn402: boolean;
triggerOnBudgetKeywords: boolean;
budgetKeywords: string[];
/** Skip fallback for tool requests (gpt-oss-120b may not support structured tool calling) */
skipForToolRequests: boolean;
maxOutputTokens: number;
}
export const EMERGENCY_FALLBACK_CONFIG: EmergencyFallbackConfig = {
enabled: true,
provider: "nvidia",
model: "openai/gpt-oss-120b",
triggerOn402: true,
triggerOnBudgetKeywords: true,
budgetKeywords: [
"insufficient funds",
"insufficient_funds",
"budget exceeded",
"budget_exceeded",
"quota exceeded",
"quota_exceeded",
"billing",
"payment required",
"out of credits",
"no credits",
"credit limit",
"spending limit",
"saldo insuficiente",
"limite de gastos",
"cota excedida",
],
skipForToolRequests: true,
maxOutputTokens: 4096,
};
export interface FallbackDecision {
shouldFallback: true;
reason: string;
provider: string;
model: string;
maxOutputTokens: number;
}
export interface NoFallbackDecision {
shouldFallback: false;
reason: string;
}
export type FallbackResult = FallbackDecision | NoFallbackDecision;
export function shouldUseFallback(
status: number,
errorBody: string,
requestHasTools: boolean,
config: EmergencyFallbackConfig = EMERGENCY_FALLBACK_CONFIG
): FallbackResult {
if (!config.enabled) return { shouldFallback: false, reason: "emergency fallback disabled" };
if (config.skipForToolRequests && requestHasTools) {
return { shouldFallback: false, reason: "skipped: request has tools" };
}
if (config.triggerOn402 && status === 402) {
return {
shouldFallback: true,
reason: `HTTP 402 → emergency fallback to ${config.provider}/${config.model}`,
provider: config.provider,
model: config.model,
maxOutputTokens: config.maxOutputTokens,
};
}
if (config.triggerOnBudgetKeywords && errorBody) {
const lowerBody = errorBody.toLowerCase();
const matched = config.budgetKeywords.find((kw) => lowerBody.includes(kw.toLowerCase()));
if (matched) {
return {
shouldFallback: true,
reason: `Budget error detected ('${matched}') → emergency fallback to ${config.provider}/${config.model}`,
provider: config.provider,
model: config.model,
maxOutputTokens: config.maxOutputTokens,
};
}
}
return { shouldFallback: false, reason: "no budget error detected" };
}
export function isFallbackDecision(result: FallbackResult): result is FallbackDecision {
return result.shouldFallback === true;
}
+375
View File
@@ -0,0 +1,375 @@
/**
* Multilingual Intent Detection for AutoCombo
*
* Classifies prompts as: code | reasoning | simple | medium
* using keywords in 9 languages (EN, PT-BR, ES, ZH, JA, RU, DE, KO, AR).
*
* Inspired by ClawRouter (BlockRunAI) multilingual routing system.
* Execution: purely synchronous, <1ms, no I/O.
*/
export type IntentType = "code" | "reasoning" | "simple" | "medium";
export const CODE_KEYWORDS: readonly string[] = [
// English
"function",
"class",
"import",
"def",
"SELECT",
"async",
"await",
"const",
"let",
"var",
"return",
"```",
"algorithm",
"compile",
"debug",
"refactor",
"typescript",
"python",
"javascript",
"code",
"implement",
"write a",
"create a component",
"endpoint",
"repository",
"deploy",
"install",
"script",
"api",
"database",
"query",
"schema",
"interface",
"generic",
"enum",
"module",
"package",
"dependency",
// Português (PT-BR)
"função",
"classe",
"importar",
"definir",
"consulta",
"assíncrono",
"aguardar",
"constante",
"variável",
"retornar",
"algoritmo",
"compilar",
"depurar",
"refatorar",
"código",
"implementar",
"criar um",
"componente",
"como fazer",
"repositório",
"configurar",
"instalar",
"banco de dados",
"escrever uma função",
"criar uma classe",
// Español
"función",
"clase",
"importar",
"definir",
"consulta",
"asíncrono",
"esperar",
"constante",
"variable",
"retornar",
"algoritmo",
"compilar",
"depurar",
"refactorizar",
"código",
"implementar",
// 中文
"函数",
"类",
"导入",
"定义",
"查询",
"异步",
"等待",
"常量",
"变量",
"返回",
"算法",
"编译",
"调试",
"代码",
// 日本語
"関数",
"クラス",
"インポート",
"非同期",
"定数",
"変数",
"コード",
"アルゴリズム",
// Русский
"функция",
"класс",
"импорт",
"запрос",
"асинхронный",
"константа",
"переменная",
"алгоритм",
"код",
// Deutsch
"funktion",
"klasse",
"importieren",
"abfrage",
"asynchron",
"konstante",
"variable",
"algorithmus",
"code",
// 한국어
"함수",
"클래스",
"가져오기",
"정의",
"쿼리",
"비동기",
"대기",
"상수",
"변수",
"반환",
"코드",
// العربية
"دالة",
"فئة",
"استيراد",
"استعلام",
"غير متزامن",
"ثابت",
"متغير",
"كود",
"خوارزمية",
];
export const REASONING_KEYWORDS: readonly string[] = [
// English
"prove",
"theorem",
"derive",
"step by step",
"chain of thought",
"formally",
"mathematical",
"proof",
"logically",
"analyze",
"reasoning",
"deduce",
"infer",
"hypothesis",
"convergence",
// Português (PT-BR)
"provar",
"teorema",
"derivar",
"passo a passo",
"cadeia de pensamento",
"formalmente",
"matemático",
"prova",
"logicamente",
"analisar",
"raciocínio",
"deduzir",
"inferir",
"hipótese",
"demonstrar",
"cálculo",
"equação diferencial",
"integral",
"otimização",
// Español
"demostrar",
"teorema",
"derivar",
"paso a paso",
"formalmente",
"matemático",
"lógicamente",
// 中文
"证明",
"定理",
"推导",
"逐步",
"思维链",
"数学",
"逻辑",
"分析",
// 日本語
"証明",
"定理",
"導出",
"論理的",
"分析",
// Русский
"доказать",
"теорема",
"шаг за шагом",
"математически",
"логически",
// Deutsch
"beweisen",
"theorem",
"schritt für schritt",
"mathematisch",
"logisch",
// 한국어
"증명",
"정리",
"단계별",
"수학적",
"논리적",
// العربية
"إثبات",
"نظرية",
"خطوة بخطوة",
"رياضي",
"منطقياً",
];
export const SIMPLE_KEYWORDS: readonly string[] = [
// English
"what is",
"define",
"translate",
"hello",
"yes or no",
"summarize",
"list",
"tell me",
"who is",
// Português (PT-BR)
"o que é",
"definir",
"traduzir",
"olá",
"oi",
"sim ou não",
"resumir",
"listar",
"me diga",
"quem é",
"quando foi",
"onde fica",
"explique brevemente",
"de forma simples",
// Español
"qué es",
"definir",
"traducir",
"hola",
"resumir",
"listar",
// 中文
"什么是",
"定义",
"翻译",
"你好",
"总结",
"列出",
// Русский
"что такое",
"определить",
"перевести",
"привет",
"резюмировать",
// Deutsch
"was ist",
"definieren",
"übersetzen",
"hallo",
"zusammenfassen",
// 한국어
"이란",
"정의",
"번역",
"안녕",
"요약",
// العربية
"ما هو",
"تعريف",
"ترجمة",
"مرحبا",
"ملخص",
];
/**
* Classify a prompt's intent using multilingual keyword matching.
* Priority: code > reasoning > simple > medium (default)
*/
export function classifyPromptIntent(prompt: string, systemPrompt?: string): IntentType {
const fullText = `${systemPrompt ?? ""} ${prompt}`.toLowerCase();
const wordCount = prompt.trim().split(/\s+/).length;
for (const kw of CODE_KEYWORDS) {
if (fullText.includes(kw.toLowerCase())) return "code";
}
for (const kw of REASONING_KEYWORDS) {
if (fullText.includes(kw.toLowerCase())) return "reasoning";
}
if (wordCount < 60) {
for (const kw of SIMPLE_KEYWORDS) {
if (fullText.includes(kw.toLowerCase())) return "simple";
}
}
return "medium";
}
export interface IntentClassifierConfig {
enabled: boolean;
extraCodeKeywords?: string[];
extraReasoningKeywords?: string[];
extraSimpleKeywords?: string[];
simpleMaxWords?: number;
}
export const DEFAULT_INTENT_CONFIG: IntentClassifierConfig = {
enabled: true,
simpleMaxWords: 60,
};
export function classifyWithConfig(
prompt: string,
config: IntentClassifierConfig,
systemPrompt?: string
): IntentType {
if (!config.enabled) return "medium";
const fullText = `${systemPrompt ?? ""} ${prompt}`.toLowerCase();
const wordCount = prompt.trim().split(/\s+/).length;
const maxSimpleWords = config.simpleMaxWords ?? 60;
const codeKws = [...CODE_KEYWORDS, ...(config.extraCodeKeywords ?? [])];
const reasoningKws = [...REASONING_KEYWORDS, ...(config.extraReasoningKeywords ?? [])];
const simpleKws = [...SIMPLE_KEYWORDS, ...(config.extraSimpleKeywords ?? [])];
for (const kw of codeKws) {
if (fullText.includes(kw.toLowerCase())) return "code";
}
for (const kw of reasoningKws) {
if (fullText.includes(kw.toLowerCase())) return "reasoning";
}
if (wordCount < maxSimpleWords) {
for (const kw of simpleKws) {
if (fullText.includes(kw.toLowerCase())) return "simple";
}
}
return "medium";
}
+12
View File
@@ -23,6 +23,18 @@ const PROVIDER_MODEL_ALIASES = {
"gemini-3-flash": "gemini-3-flash-preview",
"raptor-mini": "oswe-vscode-prime",
},
gemini: {
"gemini-3.1-pro-preview": "gemini-3.1-pro",
"gemini-3-1-pro": "gemini-3.1-pro",
},
"gemini-cli": {
"gemini-3.1-pro-preview": "gemini-3.1-pro",
"gemini-3-1-pro": "gemini-3.1-pro",
},
nvidia: {
"gpt-oss-120b": "openai/gpt-oss-120b",
"nvidia/gpt-oss-120b": "openai/gpt-oss-120b",
},
antigravity: {},
};
+50
View File
@@ -0,0 +1,50 @@
import { PROVIDER_ID_TO_ALIAS, PROVIDER_MODELS } from "../config/providerModels.ts";
import { parseModel } from "./model.ts";
// Conservative denylist fallback used when registry metadata is absent.
// Keep small and explicit to avoid false negatives.
const TOOL_CALLING_UNSUPPORTED_PATTERNS = [
"gpt-oss-120b",
"deepseek-reasoner",
"glm-4.7",
"glm4.7",
];
function getRegistryToolCallingFlag(providerIdOrAlias: string, modelId: string): boolean | null {
const providerAlias = PROVIDER_ID_TO_ALIAS[providerIdOrAlias] || providerIdOrAlias;
const models = PROVIDER_MODELS[providerAlias];
if (!Array.isArray(models)) return null;
const found = models.find((m) => m?.id === modelId);
if (!found) return null;
return typeof found.toolCalling === "boolean" ? found.toolCalling : null;
}
/**
* Returns whether a model should be considered safe for structured function/tool calling.
*
* Decision order:
* 1) Provider registry metadata (toolCalling flag) when available.
* 2) Conservative denylist fallback for known problematic model families.
* 3) Default true.
*/
export function supportsToolCalling(modelStr: string): boolean {
const parsed = parseModel(modelStr);
const provider = parsed.provider || parsed.providerAlias || "";
const model = parsed.model || modelStr;
if (provider) {
const fromRegistry = getRegistryToolCallingFlag(provider, model);
if (fromRegistry !== null) return fromRegistry;
}
const normalized = String(modelStr || "").toLowerCase();
if (!normalized) return false;
const blocked = TOOL_CALLING_UNSUPPORTED_PATTERNS.some((pattern) => {
if (normalized === pattern) return true;
if (normalized.endsWith(`/${pattern}`)) return true;
return normalized.includes(pattern);
});
return !blocked;
}
+120
View File
@@ -0,0 +1,120 @@
/**
* Request Deduplication Service
*
* Deduplicates **concurrent** identical requests to the same upstream.
* Inspired by ClawRouter's dedup.ts (BlockRunAI / github.com/BlockRunAI/ClawRouter).
*
* IMPORTANT: In-memory only does NOT persist across restarts and does NOT
* work across multiple process instances (no cross-instance dedup).
*/
import { createHash } from "node:crypto";
export interface DedupConfig {
enabled: boolean;
maxTemperatureForDedup: number;
timeoutMs: number;
}
export const DEFAULT_DEDUP_CONFIG: DedupConfig = {
enabled: true,
maxTemperatureForDedup: 0.1,
timeoutMs: 60_000,
};
export interface DedupResult<T> {
result: T;
wasDeduplicated: boolean;
hash: string;
}
const inflight = new Map<string, Promise<unknown>>();
/**
* Compute a deterministic hash for a request body.
* Includes: model, messages, temperature, tools, tool_choice, max_tokens, response_format
* Excludes: stream, user, metadata (don't affect LLM output)
*/
export function computeRequestHash(requestBody: unknown): string {
const body = requestBody as Record<string, unknown>;
const canonical = {
model: body.model ?? null,
messages: body.messages ?? null,
temperature: typeof body.temperature === "number" ? body.temperature : 1.0,
tools: body.tools ?? null,
tool_choice: body.tool_choice ?? null,
max_tokens: body.max_tokens ?? null,
response_format: body.response_format ?? null,
top_p: body.top_p ?? null,
frequency_penalty: body.frequency_penalty ?? null,
presence_penalty: body.presence_penalty ?? null,
};
return createHash("sha256").update(JSON.stringify(canonical)).digest("hex").slice(0, 16);
}
/** Determine whether a request should be deduplicated */
export function shouldDeduplicate(
requestBody: unknown,
config: DedupConfig = DEFAULT_DEDUP_CONFIG
): boolean {
if (!config.enabled) return false;
const body = requestBody as Record<string, unknown>;
if (body.stream === true) return false;
const temperature = typeof body.temperature === "number" ? body.temperature : 1.0;
if (temperature > config.maxTemperatureForDedup) return false;
return true;
}
/**
* Execute a request with deduplication.
* Concurrent identical requests share one upstream call.
*/
export async function deduplicate<T>(
hash: string,
fn: () => Promise<T>,
config: DedupConfig = DEFAULT_DEDUP_CONFIG
): Promise<DedupResult<T>> {
if (!config.enabled) {
return { result: await fn(), wasDeduplicated: false, hash };
}
const existing = inflight.get(hash);
if (existing) {
const result = (await existing) as T;
return { result, wasDeduplicated: true, hash };
}
let resolve!: (value: T) => void;
let reject!: (reason: unknown) => void;
const sharedPromise = new Promise<T>((res, rej) => {
resolve = res;
reject = rej;
});
inflight.set(hash, sharedPromise as Promise<unknown>);
const timer = setTimeout(() => {
if (inflight.get(hash) === sharedPromise) inflight.delete(hash);
}, config.timeoutMs);
try {
const result = await fn();
resolve(result);
return { result, wasDeduplicated: false, hash };
} catch (err) {
reject(err);
throw err;
} finally {
clearTimeout(timer);
if (inflight.get(hash) === sharedPromise) inflight.delete(hash);
}
}
export function getInflightCount(): number {
return inflight.size;
}
export function getInflightHashes(): string[] {
return [...inflight.keys()];
}
export function clearInflight(): void {
inflight.clear();
}
+142
View File
@@ -0,0 +1,142 @@
/**
* Search Cache in-memory TTL cache with request coalescing
*
* Bounded at MAX_CACHE_ENTRIES to prevent OOM.
* Request coalescing deduplicates concurrent identical queries
* to prevent cache stampede (critical for agentic tools).
*/
import { createHash } from "crypto";
const MAX_CACHE_ENTRIES = 5000;
const DEFAULT_TTL_MS = parseInt(process.env.SEARCH_CACHE_TTL_MS || String(5 * 60 * 1000), 10);
interface CacheEntry<T> {
data: T;
expiresAt: number;
}
const cache = new Map<string, CacheEntry<unknown>>();
const inflight = new Map<string, Promise<unknown>>();
let hits = 0;
let misses = 0;
/**
* Normalize a query for cache key computation.
* NFKC normalization, lowercase, trim, collapse whitespace.
*/
function normalizeQuery(query: string): string {
return query.normalize("NFKC").toLowerCase().trim().replace(/\s+/g, " ");
}
/**
* Compute a deterministic cache key from search parameters.
*/
export function computeCacheKey(
query: string,
provider: string,
searchType: string,
maxResults: number,
country?: string,
language?: string,
filters?: unknown
): string {
const normalized = normalizeQuery(query);
const payload = JSON.stringify({
q: normalized,
p: provider,
t: searchType,
n: maxResults,
c: country || null,
l: language || null,
f: filters || null,
});
return createHash("sha256").update(payload).digest("hex");
}
/**
* Evict expired entries and enforce size bound.
* Called lazily on writes. O(n) worst case but amortized O(1).
*/
function evictIfNeeded(): void {
const now = Date.now();
// Remove expired entries first
for (const [key, entry] of cache) {
if (entry.expiresAt <= now) {
cache.delete(key);
}
}
// FIFO eviction if still over limit
while (cache.size >= MAX_CACHE_ENTRIES) {
const firstKey = cache.keys().next().value;
if (firstKey !== undefined) {
cache.delete(firstKey);
} else {
break;
}
}
}
/**
* Get or coalesce: return cached data, join an inflight request,
* or execute the fetch function and cache the result.
*
* @param key - Cache key from computeCacheKey()
* @param ttlMs - TTL in milliseconds (0 to bypass cache)
* @param fetchFn - Function to execute on cache miss
* @returns The cached or freshly fetched data
*/
export async function getOrCoalesce<T>(
key: string,
ttlMs: number,
fetchFn: () => Promise<T>
): Promise<{ data: T; cached: boolean }> {
// 1. Check cache
const cached = cache.get(key) as CacheEntry<T> | undefined;
if (cached && cached.expiresAt > Date.now()) {
hits++;
return { data: cached.data, cached: true };
}
// 2. Join inflight request if one exists (request coalescing)
const existing = inflight.get(key) as Promise<T> | undefined;
if (existing) {
hits++;
const data = await existing;
return { data, cached: true };
}
// 3. Cache miss — execute fetch
misses++;
const promise = fetchFn();
inflight.set(key, promise);
try {
const data = await promise;
// Store in cache
if (ttlMs > 0) {
evictIfNeeded();
cache.set(key, { data, expiresAt: Date.now() + ttlMs });
}
return { data, cached: false };
} finally {
inflight.delete(key);
}
}
/**
* Get cache statistics for monitoring.
*/
export function getCacheStats(): { size: number; hits: number; misses: number } {
return { size: cache.size, hits, misses };
}
/**
* Default TTL for search cache entries.
*/
export const SEARCH_CACHE_DEFAULT_TTL_MS = DEFAULT_TTL_MS;
@@ -208,7 +208,7 @@ export function openaiResponsesToOpenAIRequest(
});
}
// Filter orphaned tool results (no matching tool_call in any assistant message)
// Filter orphaned tool results (no matching tool_call in assistant messages)
const allToolCallIds = new Set<string>();
for (const m of messages) {
const rec = toRecord(m);
+42 -43
View File
@@ -1,12 +1,12 @@
{
"name": "omniroute",
"version": "2.6.7",
"version": "2.7.0",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "omniroute",
"version": "2.6.7",
"version": "2.7.0",
"hasInstallScript": true,
"license": "MIT",
"workspaces": [
@@ -1725,9 +1725,9 @@
}
},
"node_modules/@next/env": {
"version": "16.1.6",
"resolved": "https://registry.npmjs.org/@next/env/-/env-16.1.6.tgz",
"integrity": "sha512-N1ySLuZjnAtN3kFnwhAwPvZah8RJxKasD7x1f8shFqhncnWZn4JMfg37diLNuoHsLAlrDfM3g4mawVdtAG8XLQ==",
"version": "16.1.7",
"resolved": "https://registry.npmjs.org/@next/env/-/env-16.1.7.tgz",
"integrity": "sha512-rJJbIdJB/RQr2F1nylZr/PJzamvNNhfr3brdKP6s/GW850jbtR70QlSfFselvIBbcPUOlQwBakexjFzqLzF6pg==",
"license": "MIT"
},
"node_modules/@next/eslint-plugin-next": {
@@ -1741,9 +1741,9 @@
}
},
"node_modules/@next/swc-darwin-arm64": {
"version": "16.1.6",
"resolved": "https://registry.npmjs.org/@next/swc-darwin-arm64/-/swc-darwin-arm64-16.1.6.tgz",
"integrity": "sha512-wTzYulosJr/6nFnqGW7FrG3jfUUlEf8UjGA0/pyypJl42ExdVgC6xJgcXQ+V8QFn6niSG2Pb8+MIG1mZr2vczw==",
"version": "16.1.7",
"resolved": "https://registry.npmjs.org/@next/swc-darwin-arm64/-/swc-darwin-arm64-16.1.7.tgz",
"integrity": "sha512-b2wWIE8sABdyafc4IM8r5Y/dS6kD80JRtOGrUiKTsACFQfWWgUQ2NwoUX1yjFMXVsAwcQeNpnucF2ZrujsBBPg==",
"cpu": [
"arm64"
],
@@ -1757,9 +1757,9 @@
}
},
"node_modules/@next/swc-darwin-x64": {
"version": "16.1.6",
"resolved": "https://registry.npmjs.org/@next/swc-darwin-x64/-/swc-darwin-x64-16.1.6.tgz",
"integrity": "sha512-BLFPYPDO+MNJsiDWbeVzqvYd4NyuRrEYVB5k2N3JfWncuHAy2IVwMAOlVQDFjj+krkWzhY2apvmekMkfQR0CUQ==",
"version": "16.1.7",
"resolved": "https://registry.npmjs.org/@next/swc-darwin-x64/-/swc-darwin-x64-16.1.7.tgz",
"integrity": "sha512-zcnVaaZulS1WL0Ss38R5Q6D2gz7MtBu8GZLPfK+73D/hp4GFMrC2sudLky1QibfV7h6RJBJs/gOFvYP0X7UVlQ==",
"cpu": [
"x64"
],
@@ -1773,9 +1773,9 @@
}
},
"node_modules/@next/swc-linux-arm64-gnu": {
"version": "16.1.6",
"resolved": "https://registry.npmjs.org/@next/swc-linux-arm64-gnu/-/swc-linux-arm64-gnu-16.1.6.tgz",
"integrity": "sha512-OJYkCd5pj/QloBvoEcJ2XiMnlJkRv9idWA/j0ugSuA34gMT6f5b7vOiCQHVRpvStoZUknhl6/UxOXL4OwtdaBw==",
"version": "16.1.7",
"resolved": "https://registry.npmjs.org/@next/swc-linux-arm64-gnu/-/swc-linux-arm64-gnu-16.1.7.tgz",
"integrity": "sha512-2ant89Lux/Q3VyC8vNVg7uBaFVP9SwoK2jJOOR0L8TQnX8CAYnh4uctAScy2Hwj2dgjVHqHLORQZJ2wH6VxhSQ==",
"cpu": [
"arm64"
],
@@ -1789,9 +1789,9 @@
}
},
"node_modules/@next/swc-linux-arm64-musl": {
"version": "16.1.6",
"resolved": "https://registry.npmjs.org/@next/swc-linux-arm64-musl/-/swc-linux-arm64-musl-16.1.6.tgz",
"integrity": "sha512-S4J2v+8tT3NIO9u2q+S0G5KdvNDjXfAv06OhfOzNDaBn5rw84DGXWndOEB7d5/x852A20sW1M56vhC/tRVbccQ==",
"version": "16.1.7",
"resolved": "https://registry.npmjs.org/@next/swc-linux-arm64-musl/-/swc-linux-arm64-musl-16.1.7.tgz",
"integrity": "sha512-uufcze7LYv0FQg9GnNeZ3/whYfo+1Q3HnQpm16o6Uyi0OVzLlk2ZWoY7j07KADZFY8qwDbsmFnMQP3p3+Ftprw==",
"cpu": [
"arm64"
],
@@ -1805,9 +1805,9 @@
}
},
"node_modules/@next/swc-linux-x64-gnu": {
"version": "16.1.6",
"resolved": "https://registry.npmjs.org/@next/swc-linux-x64-gnu/-/swc-linux-x64-gnu-16.1.6.tgz",
"integrity": "sha512-2eEBDkFlMMNQnkTyPBhQOAyn2qMxyG2eE7GPH2WIDGEpEILcBPI/jdSv4t6xupSP+ot/jkfrCShLAa7+ZUPcJQ==",
"version": "16.1.7",
"resolved": "https://registry.npmjs.org/@next/swc-linux-x64-gnu/-/swc-linux-x64-gnu-16.1.7.tgz",
"integrity": "sha512-KWVf2gxYvHtvuT+c4MBOGxuse5TD7DsMFYSxVxRBnOzok/xryNeQSjXgxSv9QpIVlaGzEn/pIuI6Koosx8CGWA==",
"cpu": [
"x64"
],
@@ -1821,9 +1821,9 @@
}
},
"node_modules/@next/swc-linux-x64-musl": {
"version": "16.1.6",
"resolved": "https://registry.npmjs.org/@next/swc-linux-x64-musl/-/swc-linux-x64-musl-16.1.6.tgz",
"integrity": "sha512-oicJwRlyOoZXVlxmIMaTq7f8pN9QNbdes0q2FXfRsPhfCi8n8JmOZJm5oo1pwDaFbnnD421rVU409M3evFbIqg==",
"version": "16.1.7",
"resolved": "https://registry.npmjs.org/@next/swc-linux-x64-musl/-/swc-linux-x64-musl-16.1.7.tgz",
"integrity": "sha512-HguhaGwsGr1YAGs68uRKc4aGWxLET+NevJskOcCAwXbwj0fYX0RgZW2gsOCzr9S11CSQPIkxmoSbuVaBp4Z3dA==",
"cpu": [
"x64"
],
@@ -1837,9 +1837,9 @@
}
},
"node_modules/@next/swc-win32-arm64-msvc": {
"version": "16.1.6",
"resolved": "https://registry.npmjs.org/@next/swc-win32-arm64-msvc/-/swc-win32-arm64-msvc-16.1.6.tgz",
"integrity": "sha512-gQmm8izDTPgs+DCWH22kcDmuUp7NyiJgEl18bcr8irXA5N2m2O+JQIr6f3ct42GOs9c0h8QF3L5SzIxcYAAXXw==",
"version": "16.1.7",
"resolved": "https://registry.npmjs.org/@next/swc-win32-arm64-msvc/-/swc-win32-arm64-msvc-16.1.7.tgz",
"integrity": "sha512-S0n3KrDJokKTeFyM/vGGGR8+pCmXYrjNTk2ZozOL1C/JFdfUIL9O1ATaJOl5r2POe56iRChbsszrjMAdWSv7kQ==",
"cpu": [
"arm64"
],
@@ -1853,9 +1853,9 @@
}
},
"node_modules/@next/swc-win32-x64-msvc": {
"version": "16.1.6",
"resolved": "https://registry.npmjs.org/@next/swc-win32-x64-msvc/-/swc-win32-x64-msvc-16.1.6.tgz",
"integrity": "sha512-NRfO39AIrzBnixKbjuo2YiYhB6o9d8v/ymU9m/Xk8cyVk+k7XylniXkHwjs4s70wedVffc6bQNbufk5v0xEm0A==",
"version": "16.1.7",
"resolved": "https://registry.npmjs.org/@next/swc-win32-x64-msvc/-/swc-win32-x64-msvc-16.1.7.tgz",
"integrity": "sha512-mwgtg8CNZGYm06LeEd+bNnOUfwOyNem/rOiP14Lsz+AnUY92Zq/LXwtebtUiaeVkhbroRCQ0c8GlR4UT1U+0yg==",
"cpu": [
"x64"
],
@@ -6817,7 +6817,6 @@
"version": "2.3.2",
"resolved": "https://registry.npmjs.org/fsevents/-/fsevents-2.3.2.tgz",
"integrity": "sha512-xiqMQR4xAeHTuB9uWm+fFRcIOgKBMiOBP+eXiyT7jsgVCq1bkVygt00oASowB7EdtpOHaaPgKt812P9ab+DDKA==",
"dev": true,
"hasInstallScript": true,
"license": "MIT",
"optional": true,
@@ -8812,14 +8811,14 @@
}
},
"node_modules/next": {
"version": "16.1.6",
"resolved": "https://registry.npmjs.org/next/-/next-16.1.6.tgz",
"integrity": "sha512-hkyRkcu5x/41KoqnROkfTm2pZVbKxvbZRuNvKXLRXxs3VfyO0WhY50TQS40EuKO9SW3rBj/sF3WbVwDACeMZyw==",
"version": "16.1.7",
"resolved": "https://registry.npmjs.org/next/-/next-16.1.7.tgz",
"integrity": "sha512-WM0L7WrSvKwoLegLYr6V+mz+RIofqQgVAfHhMp9a88ms0cFX8iX9ew+snpWlSBwpkURJOUdvCEt3uLl3NNzvWg==",
"license": "MIT",
"dependencies": {
"@next/env": "16.1.6",
"@next/env": "16.1.7",
"@swc/helpers": "0.5.15",
"baseline-browser-mapping": "^2.8.3",
"baseline-browser-mapping": "^2.9.19",
"caniuse-lite": "^1.0.30001579",
"postcss": "8.4.31",
"styled-jsx": "5.1.6"
@@ -8831,14 +8830,14 @@
"node": ">=20.9.0"
},
"optionalDependencies": {
"@next/swc-darwin-arm64": "16.1.6",
"@next/swc-darwin-x64": "16.1.6",
"@next/swc-linux-arm64-gnu": "16.1.6",
"@next/swc-linux-arm64-musl": "16.1.6",
"@next/swc-linux-x64-gnu": "16.1.6",
"@next/swc-linux-x64-musl": "16.1.6",
"@next/swc-win32-arm64-msvc": "16.1.6",
"@next/swc-win32-x64-msvc": "16.1.6",
"@next/swc-darwin-arm64": "16.1.7",
"@next/swc-darwin-x64": "16.1.7",
"@next/swc-linux-arm64-gnu": "16.1.7",
"@next/swc-linux-arm64-musl": "16.1.7",
"@next/swc-linux-x64-gnu": "16.1.7",
"@next/swc-linux-x64-musl": "16.1.7",
"@next/swc-win32-arm64-msvc": "16.1.7",
"@next/swc-win32-x64-msvc": "16.1.7",
"sharp": "^0.34.4"
},
"peerDependencies": {
+1 -1
View File
@@ -1,6 +1,6 @@
{
"name": "omniroute",
"version": "2.6.7",
"version": "2.7.2",
"description": "Smart AI Router with auto fallback — route to FREE & cheap models, zero downtime. Works with Cursor, Cline, Claude Desktop, Codex, and any OpenAI-compatible tool.",
"type": "module",
"bin": {
Binary file not shown.

After

Width:  |  Height:  |  Size: 3.2 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.2 KiB

+1
View File
@@ -0,0 +1 @@
<svg width="56" height="64" viewBox="0 0 56 64" fill="none" xmlns="http://www.w3.org/2000/svg"><path fill-rule="evenodd" clip-rule="evenodd" d="M53.292 15.321l1.5-3.676s-1.909-2.043-4.227-4.358c-2.317-2.315-7.225-.953-7.225-.953L37.751 0H18.12l-5.589 6.334s-4.908-1.362-7.225.953C2.988 9.602 1.08 11.645 1.08 11.645l1.5 3.676-1.91 5.447s5.614 21.236 6.272 23.83c1.295 5.106 2.181 7.08 5.862 9.668 3.68 2.587 10.36 7.08 11.45 7.762 1.091.68 2.455 1.84 3.682 1.84 1.227 0 2.59-1.16 3.68-1.84 1.091-.681 7.77-5.175 11.452-7.762 3.68-2.587 4.567-4.562 5.862-9.668.657-2.594 6.27-23.83 6.27-23.83l-1.908-5.447z" fill="url(#paint0_linear)"/><path fill-rule="evenodd" clip-rule="evenodd" d="M34.888 11.508c.818 0 6.885-1.157 6.885-1.157s7.189 8.68 7.189 10.536c0 1.534-.619 2.134-1.347 2.842-.152.148-.31.3-.467.468l-5.39 5.717a9.42 9.42 0 01-.176.18c-.538.54-1.33 1.336-.772 2.658l.115.269c.613 1.432 1.37 3.2.407 4.99-1.025 1.906-2.78 3.178-3.905 2.967-1.124-.21-3.766-1.589-4.737-2.218-.971-.63-4.05-3.166-4.05-4.137 0-.809 2.214-2.155 3.29-2.81.214-.13.383-.232.48-.298.111-.075.297-.19.526-.332.981-.61 2.754-1.71 2.799-2.197.055-.602.034-.778-.758-2.264-.168-.316-.365-.654-.568-1.004-.754-1.295-1.598-2.745-1.41-3.784.21-1.173 2.05-1.845 3.608-2.415.194-.07.385-.14.567-.209l1.623-.609c1.556-.582 3.284-1.229 3.57-1.36.394-.181.292-.355-.903-.468a54.655 54.655 0 01-.58-.06c-1.48-.157-4.209-.446-5.535-.077-.261.073-.553.152-.86.235-1.49.403-3.317.897-3.493 1.182-.03.05-.06.093-.089.133-.168.238-.277.394-.091 1.406.055.302.169.895.31 1.629.41 2.148 1.053 5.498 1.134 6.25.011.106.024.207.036.305.103.84.171 1.399-.805 1.622l-.255.058c-1.102.252-2.717.623-3.3.623-.584 0-2.2-.37-3.302-.623l-.254-.058c-.976-.223-.907-.782-.804-1.622.012-.098.024-.2.035-.305.081-.753.725-4.112 1.137-6.259.14-.73.253-1.32.308-1.62.185-1.012.076-1.168-.092-1.406a3.743 3.743 0 01-.09-.133c-.174-.285-2-.779-3.491-1.182-.307-.083-.6-.162-.86-.235-1.327-.37-4.055-.08-5.535.077-.226.024-.422.045-.58.06-1.196.113-1.297.287-.903.468.285.131 2.013.778 3.568 1.36.597.223 1.17.437 1.624.609.183.069.373.138.568.21 1.558.57 3.398 1.241 3.608 2.414.187 1.039-.657 2.489-1.41 3.784-.204.35-.4.688-.569 1.004-.791 1.486-.812 1.662-.757 2.264.044.488 1.816 1.587 2.798 2.197.229.142.415.257.526.332.098.066.266.168.48.298 1.076.654 3.29 2 3.29 2.81 0 .97-3.078 3.507-4.05 4.137-.97.63-3.612 2.008-4.737 2.218-1.124.21-2.88-1.061-3.904-2.966-.963-1.791-.207-3.559.406-4.99l.115-.27c.559-1.322-.233-2.118-.772-2.658a9.377 9.377 0 01-.175-.18l-5.39-5.717c-.158-.167-.316-.32-.468-.468-.728-.707-1.346-1.308-1.346-2.842 0-1.855 7.189-10.536 7.189-10.536s6.066 1.157 6.884 1.157c.653 0 1.913-.433 3.227-.885.333-.114.669-.23 1-.34 1.635-.545 2.726-.549 2.726-.549s1.09.004 2.726.549c.33.11.667.226 1 .34 1.313.452 2.574.885 3.226.885zm-1.041 30.706c1.282.66 2.192 1.128 2.536 1.343.445.278.174.803-.232 1.09-.405.285-5.853 4.499-6.381 4.965l-.215.191c-.509.459-1.159 1.044-1.62 1.044-.46 0-1.11-.586-1.62-1.044l-.213-.191c-.53-.466-5.977-4.68-6.382-4.966-.405-.286-.677-.81-.232-1.09.344-.214 1.255-.683 2.539-1.344l1.22-.629c1.92-.992 4.315-1.837 4.689-1.837.373 0 2.767.844 4.689 1.837.436.226.845.437 1.222.63z" fill="#fff"/><path fill-rule="evenodd" clip-rule="evenodd" d="M43.34 6.334L37.751 0H18.12l-5.589 6.334s-4.908-1.362-7.225.953c0 0 6.544-.59 8.793 3.064 0 0 6.066 1.157 6.884 1.157.818 0 2.59-.68 4.226-1.225 1.636-.545 2.727-.549 2.727-.549s1.09.004 2.726.549 3.408 1.225 4.226 1.225c.818 0 6.885-1.157 6.885-1.157 2.249-3.654 8.792-3.064 8.792-3.064-2.317-2.315-7.225-.953-7.225-.953z" fill="url(#paint1_linear)"/><defs><linearGradient id="paint0_linear" x1=".671" y1="64.319" x2="55.2" y2="64.319" gradientUnits="userSpaceOnUse"><stop stop-color="#F50"/><stop offset=".41" stop-color="#F50"/><stop offset=".582" stop-color="#FF2000"/><stop offset="1" stop-color="#FF2000"/></linearGradient><linearGradient id="paint1_linear" x1="6.278" y1="11.466" x2="50.565" y2="11.466" gradientUnits="userSpaceOnUse"><stop stop-color="#FF452A"/><stop offset="1" stop-color="#FF2000"/></linearGradient></defs></svg>

After

Width:  |  Height:  |  Size: 4.0 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.6 KiB

+4
View File
@@ -0,0 +1,4 @@
<svg xmlns="http://www.w3.org/2000/svg" width="48" height="48" viewBox="0 0 48 48">
<rect width="48" height="48" rx="8" fill="#1E40AF"/>
<text x="24" y="32" text-anchor="middle" font-family="system-ui,-apple-system,sans-serif" font-size="22" font-weight="700" fill="white">exa</text>
</svg>

After

Width:  |  Height:  |  Size: 295 B

+4
View File
@@ -0,0 +1,4 @@
<svg xmlns="http://www.w3.org/2000/svg" width="48" height="48" viewBox="0 0 48 48">
<rect width="48" height="48" rx="8" fill="#1E40AF"/>
<text x="24" y="32" text-anchor="middle" font-family="system-ui,-apple-system,sans-serif" font-size="22" font-weight="700" fill="white">exa</text>
</svg>

After

Width:  |  Height:  |  Size: 295 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.0 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 2.1 KiB

After

Width:  |  Height:  |  Size: 7.0 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.3 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.3 KiB

+63 -7
View File
@@ -14,6 +14,7 @@
*
* Fixes: https://github.com/diegosouzapw/OmniRoute/issues/129
* Fixes: https://github.com/diegosouzapw/OmniRoute/issues/321
* Fixes: https://github.com/diegosouzapw/OmniRoute/issues/426
*/
import { existsSync, copyFileSync, mkdirSync } from "node:fs";
@@ -80,8 +81,54 @@ if (existsSync(rootBinary)) {
}
}
// Strategy 1.5: Use node-pre-gyp to download the correct prebuilt binary
// This works on Windows without requiring node-gyp, Python, or MSVC.
// better-sqlite3 ships prebuilts for win32-x64, win32-arm64, darwin-x64/arm64.
console.log(" 📥 Attempting to download prebuilt binary via node-pre-gyp...");
try {
const { execSync } = await import("node:child_process");
// better-sqlite3 bundles @mapbox/node-pre-gyp — use it directly
const preGypBin = join(
ROOT,
"app",
"node_modules",
".bin",
process.platform === "win32" ? "node-pre-gyp.cmd" : "node-pre-gyp"
);
const preGypFallback = join(
ROOT,
"app",
"node_modules",
"@mapbox",
"node-pre-gyp",
"bin",
"node-pre-gyp"
);
const preGypCmd = existsSync(preGypBin) ? preGypBin : preGypFallback;
if (existsSync(preGypCmd)) {
execSync(`"${process.execPath}" "${preGypCmd}" install --fallback-to-build=false`, {
cwd: join(ROOT, "app", "node_modules", "better-sqlite3"),
stdio: "inherit",
timeout: 60_000,
});
mkdirSync(dirname(appBinary), { recursive: true });
try {
process.dlopen({ exports: {} }, appBinary);
console.log(" ✅ Prebuilt binary downloaded and loaded successfully!\n");
process.exit(0);
} catch (loadErr) {
console.warn(` ⚠️ Downloaded binary failed to load: ${loadErr.message}`);
}
} else {
console.warn(" ⚠️ node-pre-gyp not found, skipping prebuilt download.");
}
} catch (err) {
console.warn(` ⚠️ node-pre-gyp download failed: ${err.message.split("\n")[0]}`);
}
// Strategy 2: Fall back to npm rebuild (may work if build tools are available)
console.log(" ⚠️ Root binary not available or incompatible, attempting npm rebuild...");
console.log(" ⚠️ Attempting npm rebuild (requires build tools)...");
try {
const { execSync } = await import("node:child_process");
@@ -103,14 +150,23 @@ try {
}
}
// If nothing worked, warn but don't fail the install — let the package stay
// installed so users can fix manually or use the pre-flight check in the CLI
console.warn(" ⚠️ Could not fix better-sqlite3 native module automatically.");
// If nothing worked, warn but don't fail the install
console.warn("\n ⚠️ Could not fix better-sqlite3 native module automatically.");
console.warn(" The server may not start correctly.");
console.warn(" Try manually:");
console.warn(` cd ${join(ROOT, "app")} && npm rebuild better-sqlite3`);
if (process.platform === "darwin") {
console.warn(" Manual fix options:");
if (process.platform === "win32") {
console.warn(" Option A (easiest — no build tools needed):");
console.warn(` cd "${join(ROOT, "app", "node_modules", "better-sqlite3")}"`);
console.warn(" npx @mapbox/node-pre-gyp install --fallback-to-build=false");
console.warn(" Option B (requires Build Tools for Visual Studio):");
console.warn(` cd "${join(ROOT, "app")}" && npm rebuild better-sqlite3`);
console.warn(" Install from: https://visualstudio.microsoft.com/visual-cpp-build-tools/");
console.warn(" Also ensure Python is installed: https://python.org");
} else if (process.platform === "darwin") {
console.warn(` cd ${join(ROOT, "app")} && npm rebuild better-sqlite3`);
console.warn(" If build tools are missing: xcode-select --install");
} else {
console.warn(` cd ${join(ROOT, "app")} && npm rebuild better-sqlite3`);
}
console.warn("");
+13
View File
@@ -278,6 +278,19 @@ if (existsSync(swcHelpersSrc) && !existsSync(swcHelpersDst)) {
console.log(" ✅ @swc/helpers included in standalone build.");
}
// ── Step 10.6: Remove large binaries from standalone build ──
// These directories contain platform-native binaries (.node, .asar) that
// trigger Z_DATA_ERROR during npm pack. They are not needed in the npm package.
const binaryDirsToRemove = ["vscode-extension", "electron"];
for (const dir of binaryDirsToRemove) {
const targetDir = join(APP_DIR, dir);
if (existsSync(targetDir)) {
console.log(` 🧹 Removing app/${dir}/ (not needed in npm package)...`);
rmSync(targetDir, { recursive: true, force: true });
console.log(` ✅ app/${dir}/ removed.`);
}
}
// ── Done ───────────────────────────────────────────────────
const appPkg = join(APP_DIR, "package.json");
if (existsSync(appPkg)) {
@@ -33,11 +33,29 @@ export default function APIPageClient({ machineId }) {
const [viewTab, setViewTab] = useState("api");
const [mcpStatus, setMcpStatus] = useState<any>(null);
const [a2aStatus, setA2aStatus] = useState<any>(null);
const [searchProviders, setSearchProviders] = useState<any[]>([]);
const { copied, copy } = useCopyToClipboard();
const fetchSearchProviders = async () => {
try {
const res = await fetch("/v1/search");
if (res.ok) {
const data = await res.json();
setSearchProviders(data.data || []);
}
} catch {
// Search endpoint may not be available
}
};
useEffect(() => {
Promise.allSettled([loadCloudSettings(), fetchModels(), fetchProtocolStatus()]).finally(() => {
Promise.allSettled([
loadCloudSettings(),
fetchModels(),
fetchProtocolStatus(),
fetchSearchProviders(),
]).finally(() => {
setLoading(false);
});
}, []);
@@ -575,6 +593,47 @@ export default function APIPageClient({ machineId }) {
</div>
</div>
{/* Search & Discovery */}
{searchProviders.length > 0 && (
<div className="mb-6">
<div className="flex items-center gap-2 mb-3">
<span className="material-symbols-outlined text-sm text-cyan-400">
travel_explore
</span>
<h3 className="text-xs font-semibold text-text-muted uppercase tracking-wider">
{t("categorySearch") || "Search & Discovery"}
</h3>
<div className="flex-1 h-px bg-border/50" />
</div>
<div className="flex flex-col gap-3">
<EndpointSection
icon="search"
iconColor="text-cyan-500"
iconBg="bg-cyan-500/10"
title={t("webSearch") || "Web Search"}
path="/v1/search"
description={
t("webSearchDesc") ||
"Unified web search across multiple providers with automatic failover and caching"
}
models={searchProviders.map((p) => ({
id: p.id,
name: p.name,
owned_by: p.id,
type: "search",
}))}
expanded={expandedEndpoint === "search"}
onToggle={() =>
setExpandedEndpoint(expandedEndpoint === "search" ? null : "search")
}
copy={copy}
copied={copied}
baseUrl={currentEndpoint}
/>
</div>
</div>
)}
{/* Utility & Management */}
<div>
<div className="flex items-center gap-2 mb-3">
@@ -101,6 +101,7 @@ export default function ProviderDetailPage() {
const isOpenAICompatible = isOpenAICompatibleProvider(providerId);
const isAnthropicCompatible = isAnthropicCompatibleProvider(providerId);
const isCompatible = isOpenAICompatible || isAnthropicCompatible;
const isSearchProvider = providerId.endsWith("-search");
const providerStorageAlias = isCompatible ? providerId : providerAlias;
const providerDisplayAlias = isCompatible ? providerNode?.prefix || providerId : providerAlias;
@@ -1060,21 +1061,43 @@ export default function ProviderDetailPage() {
)}
</Card>
{/* Models */}
<Card>
<h2 className="text-lg font-semibold mb-4">{t("availableModels")}</h2>
{renderModelsSection()}
{/* Models — hidden for search providers (they don't have models) */}
{!isSearchProvider && (
<Card>
<h2 className="text-lg font-semibold mb-4">{t("availableModels")}</h2>
{renderModelsSection()}
{/* Custom Models — available for ALL providers */}
{!isCompatible && (
<CustomModelsSection
providerId={providerId}
providerAlias={providerDisplayAlias}
copied={copied}
onCopy={copy}
/>
)}
</Card>
{/* Custom Models — available for non-compatible, non-search providers */}
{!isCompatible && (
<CustomModelsSection
providerId={providerId}
providerAlias={providerDisplayAlias}
copied={copied}
onCopy={copy}
/>
)}
</Card>
)}
{/* Search provider info */}
{isSearchProvider && (
<Card>
<h2 className="text-lg font-semibold mb-4">{t("searchProvider") || "Search Provider"}</h2>
<p className="text-sm text-text-muted">
{t("searchProviderDesc") ||
"This provider is used for web search via POST /v1/search. No model configuration needed — search providers are ready to use once an API key is connected."}
</p>
{providerId === "perplexity-search" && (
<div className="mt-3 flex items-center gap-2 px-3 py-2 rounded-lg bg-blue-500/10 border border-blue-500/20">
<span className="material-symbols-outlined text-sm text-blue-400">link</span>
<p className="text-xs text-blue-300">
Uses the same API key as <strong>Perplexity</strong> (chat provider). If you already
have Perplexity configured, no additional setup is needed.
</p>
</div>
)}
</Card>
)}
{/* Modals */}
{providerId === "kiro" ? (
+50
View File
@@ -0,0 +1,50 @@
/**
* GET /api/logs/detail List detailed request logs
* GET /api/logs/detail/:id Get specific detailed log
* POST /api/logs/detail/toggle Enable/disable detailed logging
*/
import { NextRequest, NextResponse } from "next/server";
import { isAuthenticated } from "@/shared/utils/apiAuth";
import {
getRequestDetailLogs,
getRequestDetailLogCount,
isDetailedLoggingEnabled,
} from "@/lib/db/detailedLogs";
import { updateSettings } from "@/lib/db/settings";
export const dynamic = "force-dynamic";
export async function GET(req: NextRequest) {
if (!isAuthenticated(req)) {
return NextResponse.json({ error: "Unauthorized" }, { status: 401 });
}
const url = new URL(req.url);
const limit = Math.min(Number(url.searchParams.get("limit") ?? 50), 200);
const offset = Number(url.searchParams.get("offset") ?? 0);
const logs = getRequestDetailLogs(limit, offset);
const total = getRequestDetailLogCount();
const enabled = await isDetailedLoggingEnabled();
return NextResponse.json({ enabled, total, logs });
}
export async function POST(req: NextRequest) {
if (!isAuthenticated(req)) {
return NextResponse.json({ error: "Unauthorized" }, { status: 401 });
}
const body = await req.json();
const enabled = body.enabled === true || body.enabled === "1";
await updateSettings({ detailed_logs_enabled: enabled });
return NextResponse.json({
success: true,
enabled,
message: enabled
? "Detailed logging enabled. Pipeline bodies will be captured for new requests."
: "Detailed logging disabled.",
});
}
+4
View File
@@ -13,6 +13,7 @@ export async function GET() {
const { getAllCircuitBreakerStatuses } = await import("@/shared/utils/circuitBreaker");
const { getAllRateLimitStatus } = await import("@omniroute/open-sse/services/rateLimitManager");
const { getAllModelLockouts } = await import("@omniroute/open-sse/services/accountFallback");
const { getInflightCount } = await import("@omniroute/open-sse/services/requestDedup.ts");
const settings = await getSettings();
const circuitBreakers = getAllCircuitBreakerStatuses();
@@ -50,6 +51,9 @@ export async function GET() {
localProviders: getAllHealthStatuses(),
rateLimitStatus,
lockouts,
dedup: {
inflightRequests: getInflightCount(),
},
setupComplete: settings?.setupComplete || false,
});
} catch (error) {
+115
View File
@@ -0,0 +1,115 @@
/**
* GET /api/system/version Returns current version and latest available on npm
* POST /api/system/update Triggers npm install -g omniroute@latest + pm2 restart
*
* Security: Requires admin authentication (same as other management routes).
* Safety: Update only runs if a newer version is available on npm.
*/
import { NextRequest, NextResponse } from "next/server";
import { execFile } from "child_process";
import { promisify } from "util";
import { isAuthenticated } from "@/shared/utils/apiAuth";
const execFileAsync = promisify(execFile);
export const dynamic = "force-dynamic";
/** Fetch latest version from npm registry (no install, just metadata) */
async function getLatestNpmVersion(): Promise<string | null> {
try {
const { stdout } = await execFileAsync("npm", ["info", "omniroute", "version", "--json"], {
timeout: 10000,
});
const parsed = JSON.parse(stdout.trim());
return typeof parsed === "string" ? parsed : null;
} catch {
return null;
}
}
/** Current installed version from package.json */
function getCurrentVersion(): string {
try {
return require("../../../../../package.json").version as string;
} catch {
return "unknown";
}
}
/** Compare semver strings — returns true if a > b */
function isNewer(a: string | null, b: string): boolean {
if (!a) return false;
const parse = (v: string) => v.split(".").map(Number);
const [aMaj, aMin, aPat] = parse(a);
const [bMaj, bMin, bPat] = parse(b);
if (aMaj !== bMaj) return aMaj > bMaj;
if (aMin !== bMin) return aMin > bMin;
return aPat > bPat;
}
export async function GET(req: NextRequest) {
if (!isAuthenticated(req)) {
return NextResponse.json({ error: "Unauthorized" }, { status: 401 });
}
const current = getCurrentVersion();
const latest = await getLatestNpmVersion();
const updateAvailable = isNewer(latest, current);
return NextResponse.json({
current,
latest: latest ?? "unavailable",
updateAvailable,
channel: "npm",
});
}
export async function POST(req: NextRequest) {
if (!isAuthenticated(req)) {
return NextResponse.json({ error: "Unauthorized" }, { status: 401 });
}
const current = getCurrentVersion();
const latest = await getLatestNpmVersion();
if (!latest) {
return NextResponse.json(
{ success: false, error: "Could not reach npm registry" },
{ status: 503 }
);
}
if (!isNewer(latest, current)) {
return NextResponse.json({
success: false,
error: `Already on latest version (${current})`,
current,
latest,
});
}
// Run update in background — client gets immediate acknowledgment
const install = async () => {
try {
await execFileAsync("npm", ["install", "-g", `omniroute@${latest}`, "--ignore-scripts"], {
timeout: 300000, // 5 minutes
});
// Restart PM2 — non-fatal if pm2 not available (Docker/manual setups)
await execFileAsync("pm2", ["restart", "omniroute"]).catch(() => null);
console.log(`[AutoUpdate] Successfully updated to v${latest}`);
} catch (err) {
console.error(`[AutoUpdate] Update failed:`, err);
}
};
// Fire-and-forget
install();
return NextResponse.json({
success: true,
message: `Update to v${latest} started. Restarting in ~30 seconds.`,
from: current,
to: latest,
});
}
+268
View File
@@ -0,0 +1,268 @@
import { CORS_ORIGIN } from "@/shared/utils/cors";
import { handleSearch } from "@omniroute/open-sse/handlers/search.ts";
import { getProviderCredentials, extractApiKey, isValidApiKey } from "@/sse/services/auth";
import {
getAllSearchProviders,
getSearchProvider,
selectProvider,
SEARCH_PROVIDERS,
SEARCH_CREDENTIAL_FALLBACKS,
} from "@omniroute/open-sse/config/searchRegistry.ts";
import { errorResponse } from "@omniroute/open-sse/utils/error.ts";
import { HTTP_STATUS } from "@omniroute/open-sse/config/constants.ts";
import * as log from "@/sse/utils/logger";
import { toJsonErrorPayload } from "@/shared/utils/upstreamError";
import { enforceApiKeyPolicy } from "@/shared/utils/apiKeyPolicy";
import { v1SearchSchema } from "@/shared/validation/schemas";
import { isValidationFailure, validateBody } from "@/shared/validation/helpers";
import { recordCost } from "@/domain/costRules";
import {
computeCacheKey,
getOrCoalesce,
SEARCH_CACHE_DEFAULT_TTL_MS,
} from "@omniroute/open-sse/services/searchCache.ts";
const CORS_HEADERS = {
"Access-Control-Allow-Origin": CORS_ORIGIN,
"Access-Control-Allow-Methods": "GET, POST, OPTIONS",
"Access-Control-Allow-Headers": "*",
};
/**
* Handle CORS preflight
*/
export async function OPTIONS() {
return new Response(null, { headers: CORS_HEADERS });
}
/**
* GET /v1/search list available search providers
*/
export async function GET() {
const providers = getAllSearchProviders();
const timestamp = Math.floor(Date.now() / 1000);
const data = providers.map((p) => ({
id: p.id,
object: "search_provider",
created: timestamp,
name: p.name,
search_types: p.searchTypes,
}));
return new Response(JSON.stringify({ object: "list", data }), {
headers: { "Content-Type": "application/json", ...CORS_HEADERS },
});
}
// Helper: resolve credentials with fallback (e.g., perplexity-search → perplexity)
async function resolveSearchCredentials(providerId: string) {
const creds = await getProviderCredentials(providerId).catch(() => null);
if (creds) return creds;
const fallbackId = SEARCH_CREDENTIAL_FALLBACKS[providerId];
if (fallbackId) return getProviderCredentials(fallbackId).catch(() => null);
return null;
}
// Helper: build domain filter array from filters object
function buildDomainFilter(filters?: {
include_domains?: string[];
exclude_domains?: string[];
}): string[] | undefined {
if (!filters) return undefined;
const parts: string[] = [];
if (filters.include_domains?.length) parts.push(...filters.include_domains);
if (filters.exclude_domains?.length) parts.push(...filters.exclude_domains.map((d) => `-${d}`));
return parts.length > 0 ? parts : undefined;
}
/**
* POST /v1/search execute a web search
*/
export async function POST(request: Request) {
let rawBody: unknown;
try {
rawBody = await request.json();
} catch {
log.warn("SEARCH", "Invalid JSON body");
return errorResponse(HTTP_STATUS.BAD_REQUEST, "Invalid JSON body");
}
const validation = validateBody(v1SearchSchema, rawBody);
if (isValidationFailure(validation)) {
return errorResponse(HTTP_STATUS.BAD_REQUEST, validation.error.message);
}
const body = validation.data;
// Optional API key validation
if (process.env.REQUIRE_API_KEY === "true") {
const apiKey = extractApiKey(request);
if (!apiKey) {
return errorResponse(HTTP_STATUS.UNAUTHORIZED, "Missing API key");
}
const valid = await isValidApiKey(apiKey);
if (!valid) {
return errorResponse(HTTP_STATUS.UNAUTHORIZED, "Invalid API key");
}
}
// Enforce API key policies — use "search" as model identifier for consistent policy config
const policy = await enforceApiKeyPolicy(request, "search");
if (policy.rejection) return policy.rejection;
// Resolve provider and credentials
let providerConfig = selectProvider(body.provider);
if (!providerConfig) {
return errorResponse(
HTTP_STATUS.BAD_REQUEST,
body.provider ? `Unknown search provider: ${body.provider}` : "No search providers available"
);
}
let credentials: Record<string, any> | null = null;
let alternateProviderId: string | undefined;
let alternateCredentials: Record<string, any> | null = null;
if (body.provider) {
// Explicit provider — single credential lookup (with fallback)
credentials = await resolveSearchCredentials(providerConfig.id);
if (!credentials) {
return errorResponse(
HTTP_STATUS.BAD_REQUEST,
`No credentials configured for search provider: ${providerConfig.id}. Add an API key for "${providerConfig.id}" in the dashboard.`
);
}
} else {
// Auto-select — try the resolved provider first, then iterate others by cost
credentials = await resolveSearchCredentials(providerConfig.id);
if (!credentials) {
// Sort by cost to find cheapest with credentials
const sortedIds = Object.values(SEARCH_PROVIDERS)
.sort((a, b) => a.costPerQuery - b.costPerQuery)
.map((p) => p.id);
for (const pid of sortedIds) {
if (pid === providerConfig.id) continue;
const altConfig = getSearchProvider(pid);
const altCreds = await resolveSearchCredentials(pid);
if (altConfig && altCreds) {
providerConfig = altConfig;
credentials = altCreds;
break;
}
}
}
if (!credentials) {
return errorResponse(
HTTP_STATUS.BAD_REQUEST,
`No credentials configured for any search provider. Add an API key for a search provider (${Object.keys(SEARCH_PROVIDERS).join(", ")}) in the dashboard.`
);
}
// Find alternate for failover — must bind credentials to the matched provider
const otherIds = Object.values(SEARCH_PROVIDERS)
.sort((a, b) => a.costPerQuery - b.costPerQuery)
.map((p) => p.id)
.filter((id) => id !== providerConfig.id);
for (const pid of otherIds) {
const creds = await resolveSearchCredentials(pid);
if (creds) {
alternateProviderId = pid;
alternateCredentials = creds;
break;
}
}
}
// Clamp max_results to provider limit
const clampedMaxResults = Math.min(body.max_results, providerConfig.maxMaxResults);
// Cache key — includes all fields that affect results
const cacheKey = computeCacheKey(
body.query,
providerConfig.id,
body.search_type,
clampedMaxResults,
body.country,
body.language,
{ filters: body.filters, offset: body.offset, time_range: body.time_range }
);
const ttl = providerConfig.cacheTTLMs || SEARCH_CACHE_DEFAULT_TTL_MS;
try {
const { data: searchResult, cached } = await getOrCoalesce(cacheKey, ttl, async () => {
const result = await handleSearch({
query: body.query,
provider: providerConfig.id,
maxResults: clampedMaxResults,
searchType: body.search_type,
country: body.country,
language: body.language,
timeRange: body.time_range,
offset: body.offset,
domainFilter: buildDomainFilter(body.filters),
contentOptions: body.content,
strictFilters: body.strict_filters,
providerOptions: body.provider_options,
credentials,
alternateProvider: alternateProviderId,
alternateCredentials,
log,
});
if (!result.success) {
throw new SearchError(result.error || "Search failed", result.status || 502);
}
return result.data!;
});
// Record cost for budget tracking (skip cache hits — no provider cost)
if (!cached && policy.apiKeyInfo?.id && searchResult.usage?.search_cost_usd > 0) {
try {
recordCost(policy.apiKeyInfo.id, searchResult.usage.search_cost_usd);
} catch (e: any) {
log.warn("SEARCH", `Cost recording failed: ${e?.message}`);
}
}
const response = {
id: `search-${crypto.randomUUID()}`,
...searchResult,
cached,
usage: cached ? { queries_used: 0, search_cost_usd: 0 } : searchResult.usage,
};
return new Response(JSON.stringify(response), {
status: 200,
headers: { "Content-Type": "application/json", ...CORS_HEADERS },
});
} catch (err: any) {
if (err instanceof SearchError) {
const errorPayload = toJsonErrorPayload(err.message, "Search provider error");
return new Response(JSON.stringify(errorPayload), {
status: err.statusCode,
headers: { "Content-Type": "application/json", ...CORS_HEADERS },
});
}
log.error("SEARCH", `Unexpected error: ${err.message}`);
const errorPayload = toJsonErrorPayload(err.message, "Internal search error");
return new Response(JSON.stringify(errorPayload), {
status: 500,
headers: { "Content-Type": "application/json", ...CORS_HEADERS },
});
}
}
class SearchError extends Error {
statusCode: number;
constructor(message: string, statusCode: number) {
super(message);
this.statusCode = statusCode;
}
}
+5
View File
@@ -818,7 +818,12 @@
"settingsApi": "Settings API",
"categoryCore": "Core APIs",
"categoryMedia": "Media & Multi-Modal",
"categorySearch": "Search & Discovery",
"categoryUtility": "Utility & Management",
"webSearch": "Web Search",
"webSearchDesc": "Unified web search across multiple providers with automatic failover and caching",
"searchProvider": "Search Provider",
"searchProviderDesc": "This provider is used for web search via POST /v1/search. No model configuration needed — search providers are ready to use once an API key is connected.",
"enableCloudTitle": "Enable Cloud Proxy",
"whatYouGet": "What you will get",
"cloudBenefitAccess": "Access your API from anywhere in the world",
+35
View File
@@ -143,6 +143,10 @@ const SCHEMA_SQL = `
tokens_cache_creation INTEGER DEFAULT 0,
tokens_reasoning INTEGER DEFAULT 0,
status TEXT,
success INTEGER DEFAULT 1,
latency_ms INTEGER DEFAULT 0,
ttft_ms INTEGER DEFAULT 0,
error_code TEXT,
timestamp TEXT NOT NULL
);
CREATE INDEX IF NOT EXISTS idx_uh_timestamp ON usage_history(timestamp);
@@ -327,6 +331,35 @@ function ensureProviderConnectionsColumns(db: SqliteDatabase) {
}
}
function ensureUsageHistoryColumns(db: SqliteDatabase) {
try {
const columns = db.prepare("PRAGMA table_info(usage_history)").all() as Array<{
name?: string;
}>;
const columnNames = new Set(columns.map((column) => String(column.name ?? "")));
if (!columnNames.has("success")) {
db.exec("ALTER TABLE usage_history ADD COLUMN success INTEGER DEFAULT 1");
console.log("[DB] Added usage_history.success column");
}
if (!columnNames.has("latency_ms")) {
db.exec("ALTER TABLE usage_history ADD COLUMN latency_ms INTEGER DEFAULT 0");
console.log("[DB] Added usage_history.latency_ms column");
}
if (!columnNames.has("ttft_ms")) {
db.exec("ALTER TABLE usage_history ADD COLUMN ttft_ms INTEGER DEFAULT 0");
console.log("[DB] Added usage_history.ttft_ms column");
}
if (!columnNames.has("error_code")) {
db.exec("ALTER TABLE usage_history ADD COLUMN error_code TEXT");
console.log("[DB] Added usage_history.error_code column");
}
} catch (error: unknown) {
const message = error instanceof Error ? error.message : String(error);
console.warn("[DB] Failed to verify usage_history schema:", message);
}
}
export function getDbInstance(): SqliteDatabase {
if (_db) return _db;
@@ -337,6 +370,7 @@ export function getDbInstance(): SqliteDatabase {
const memoryDb = new Database(":memory:");
memoryDb.pragma("journal_mode = WAL");
memoryDb.exec(SCHEMA_SQL);
ensureUsageHistoryColumns(memoryDb);
_db = memoryDb;
return memoryDb;
}
@@ -420,6 +454,7 @@ export function getDbInstance(): SqliteDatabase {
db.pragma("synchronous = NORMAL");
db.exec(SCHEMA_SQL);
ensureProviderConnectionsColumns(db);
ensureUsageHistoryColumns(db);
// ── Versioned Migrations ──
// Auto-seed 001 as applied (the inline SCHEMA_SQL already created these tables)
+101
View File
@@ -0,0 +1,101 @@
/**
* Detailed Request Logs DB Layer (#378)
*
* Saves full request/response bodies at each pipeline stage.
* Ring-buffer of 500 entries enforced by SQL trigger in migration 006.
* Only active when settings.detailed_logs_enabled = "1".
*/
import { v4 as uuidv4 } from "uuid";
import { getDbInstance } from "./core";
import { getSettings } from "./settings";
export interface RequestDetailLog {
id?: string;
call_log_id?: string | null;
timestamp?: string;
client_request?: string | null;
translated_request?: string | null;
provider_response?: string | null;
client_response?: string | null;
provider?: string | null;
model?: string | null;
source_format?: string | null;
target_format?: string | null;
duration_ms?: number;
}
/** Returns true if detailed logging is enabled in settings */
export async function isDetailedLoggingEnabled(): Promise<boolean> {
try {
const settings = await getSettings();
const val = settings.detailed_logs_enabled;
return val === true || val === "1" || val === "true";
} catch {
return false;
}
}
/** Save a detailed log entry — caller must verify isDetailedLoggingEnabled() first */
export function saveRequestDetailLog(entry: RequestDetailLog): void {
const db = getDbInstance();
const id = entry.id ?? uuidv4();
const timestamp = entry.timestamp ?? new Date().toISOString();
// Trim large bodies to avoid excessive disk usage (max 64KB each)
const trim = (s: string | null | undefined, max = 65536): string | null => {
if (!s) return null;
return s.length > max ? s.slice(0, max) + "…[truncated]" : s;
};
db.prepare(
`
INSERT INTO request_detail_logs
(id, call_log_id, timestamp, client_request, translated_request,
provider_response, client_response, provider, model, source_format, target_format, duration_ms)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
`
).run(
id,
entry.call_log_id ?? null,
timestamp,
trim(entry.client_request),
trim(entry.translated_request),
trim(entry.provider_response),
trim(entry.client_response),
entry.provider ?? null,
entry.model ?? null,
entry.source_format ?? null,
entry.target_format ?? null,
entry.duration_ms ?? 0
);
}
/** Fetch detailed logs (latest first) */
export function getRequestDetailLogs(limit = 50, offset = 0): RequestDetailLog[] {
const db = getDbInstance();
return db
.prepare(
`
SELECT * FROM request_detail_logs
ORDER BY timestamp DESC
LIMIT ? OFFSET ?
`
)
.all(limit, offset) as RequestDetailLog[];
}
/** Get a single detailed log by ID */
export function getRequestDetailLogById(id: string): RequestDetailLog | null {
const db = getDbInstance();
return (db.prepare("SELECT * FROM request_detail_logs WHERE id = ?").get(id) ??
null) as RequestDetailLog | null;
}
/** Get total count of detailed logs */
export function getRequestDetailLogCount(): number {
const db = getDbInstance();
const row = db.prepare("SELECT COUNT(*) as cnt FROM request_detail_logs").get() as {
cnt: number;
};
return row?.cnt ?? 0;
}
@@ -98,6 +98,10 @@ CREATE TABLE IF NOT EXISTS usage_history (
tokens_cache_creation INTEGER DEFAULT 0,
tokens_reasoning INTEGER DEFAULT 0,
status TEXT,
success INTEGER DEFAULT 1,
latency_ms INTEGER DEFAULT 0,
ttft_ms INTEGER DEFAULT 0,
error_code TEXT,
timestamp TEXT NOT NULL
);
CREATE INDEX IF NOT EXISTS idx_uh_timestamp ON usage_history(timestamp);
@@ -0,0 +1,19 @@
-- 005_combo_agent_fields.sql
-- Safe migration for existing users: adds optional agent fields to combos.
-- Uses ADD COLUMN with DEFAULT NULL (SQLite compatible) — existing rows are untouched.
-- New fields are read as NULL by old code versions (backward compatible).
-- System prompt override: when set, injected as the first system message before
-- forwarding to the provider. Overrides any system message from the client.
ALTER TABLE combos ADD COLUMN system_message TEXT DEFAULT NULL;
-- Regex-based tool filter: when set, only tool calls whose "name" matches this
-- regex pattern are forwarded to the provider. Others are stripped silently.
-- Example: "^(gh_|create_file|web_fetch)" — allows only GitHub and web tools.
ALTER TABLE combos ADD COLUMN tool_filter_regex TEXT DEFAULT NULL;
-- Context caching protection: when 1, the proxy tags assistant responses with
-- <omniModel>provider/model</omniModel> and pins the model for the session.
ALTER TABLE combos ADD COLUMN context_cache_protection INTEGER DEFAULT 0;
CREATE INDEX IF NOT EXISTS idx_combos_cache_protection ON combos(context_cache_protection);
@@ -0,0 +1,42 @@
-- 006_detailed_request_logs.sql
-- Stores full request/response bodies at each pipeline stage for debugging.
-- Only populated when detailed_logs_enabled = 1 in settings (off by default).
-- Ring-buffer enforced via trigger: keeps only the last 500 entries.
-- Existing users are not impacted (table is new, feature is opt-in).
CREATE TABLE IF NOT EXISTS request_detail_logs (
id TEXT PRIMARY KEY,
call_log_id TEXT, -- FK to call_logs.id (optional, nullable)
timestamp TEXT NOT NULL,
-- The 4 pipeline stages (all nullable — only populated when available)
client_request TEXT, -- Raw body received from the client (JSON)
translated_request TEXT, -- Body after format translation (JSON)
provider_response TEXT, -- Raw body from the provider (JSON)
client_response TEXT, -- Final body sent to the client (JSON)
-- Metadata
provider TEXT,
model TEXT,
source_format TEXT,
target_format TEXT,
duration_ms INTEGER DEFAULT 0
);
CREATE INDEX IF NOT EXISTS idx_rdl_timestamp ON request_detail_logs(timestamp);
CREATE INDEX IF NOT EXISTS idx_rdl_call_log_id ON request_detail_logs(call_log_id);
-- Ring-buffer trigger: auto-delete oldest records beyond 500
CREATE TRIGGER IF NOT EXISTS trg_rdl_ring_buffer
AFTER INSERT ON request_detail_logs
BEGIN
DELETE FROM request_detail_logs
WHERE id IN (
SELECT id FROM request_detail_logs
ORDER BY timestamp ASC
LIMIT MAX(0, (SELECT COUNT(*) FROM request_detail_logs) - 500)
);
END;
-- Settings key for enabling/disabling detailed logs (default: disabled)
-- Inserted only if not already present (safe for existing installs)
INSERT OR IGNORE INTO key_value (namespace, key, value)
VALUES ('settings', 'detailed_logs_enabled', '0');
@@ -0,0 +1,4 @@
-- Add request_type column to call_logs for non-chat request tracking (search, embed, rerank).
-- Backward-compatible: DEFAULT NULL means existing rows are unaffected.
ALTER TABLE call_logs ADD COLUMN request_type TEXT DEFAULT NULL;
CREATE INDEX IF NOT EXISTS idx_call_logs_request_type ON call_logs(request_type);
+73
View File
@@ -440,6 +440,69 @@ async function validateAnthropicCompatibleProvider({ apiKey, providerSpecificDat
}
}
// ── Search provider validators (factored) ──
async function validateSearchProvider(
url: string,
init: RequestInit
): Promise<{ valid: boolean; error: string | null }> {
try {
const response = await fetch(url, init);
if (response.ok) return { valid: true, error: null };
if (response.status === 401 || response.status === 403) {
return { valid: false, error: "Invalid API key" };
}
return { valid: false, error: `Validation failed: ${response.status}` };
} catch (error: any) {
return { valid: false, error: error.message || "Validation failed" };
}
}
const SEARCH_VALIDATOR_CONFIGS: Record<
string,
(apiKey: string) => { url: string; init: RequestInit }
> = {
"serper-search": (apiKey) => ({
url: "https://google.serper.dev/search",
init: {
method: "POST",
headers: { "Content-Type": "application/json", "X-API-Key": apiKey },
body: JSON.stringify({ q: "test", num: 1 }),
},
}),
"brave-search": (apiKey) => ({
url: "https://api.search.brave.com/res/v1/web/search?q=test&count=1",
init: {
method: "GET",
headers: { Accept: "application/json", "X-Subscription-Token": apiKey },
},
}),
"perplexity-search": (apiKey) => ({
url: "https://api.perplexity.ai/search",
init: {
method: "POST",
headers: { "Content-Type": "application/json", Authorization: `Bearer ${apiKey}` },
body: JSON.stringify({ query: "test", max_results: 1 }),
},
}),
"exa-search": (apiKey) => ({
url: "https://api.exa.ai/search",
init: {
method: "POST",
headers: { "Content-Type": "application/json", "x-api-key": apiKey },
body: JSON.stringify({ query: "test", numResults: 1 }),
},
}),
"tavily-search": (apiKey) => ({
url: "https://api.tavily.com/search",
init: {
method: "POST",
headers: { "Content-Type": "application/json", Authorization: `Bearer ${apiKey}` },
body: JSON.stringify({ query: "test", max_results: 1 }),
},
}),
};
export async function validateProviderApiKey({ provider, apiKey, providerSpecificData = {} }: any) {
if (!provider || !apiKey) {
return { valid: false, error: "Provider and API key required", unsupported: false };
@@ -468,6 +531,16 @@ export async function validateProviderApiKey({ provider, apiKey, providerSpecifi
nanobanana: validateNanoBananaProvider,
elevenlabs: validateElevenLabsProvider,
inworld: validateInworldProvider,
// Search providers — use factored validator
...Object.fromEntries(
Object.entries(SEARCH_VALIDATOR_CONFIGS).map(([id, configFn]) => [
id,
({ apiKey }: any) => {
const { url, init } = configFn(apiKey);
return validateSearchProvider(url, init);
},
])
),
};
if (SPECIALTY_VALIDATORS[provider]) {
+3 -2
View File
@@ -186,6 +186,7 @@ export async function saveCallLog(entry: any) {
duration: entry.duration || 0,
tokensIn: entry.tokens?.prompt_tokens || 0,
tokensOut: entry.tokens?.completion_tokens || 0,
requestType: entry.requestType || null,
sourceFormat: entry.sourceFormat || null,
targetFormat: entry.targetFormat || null,
apiKeyId,
@@ -201,10 +202,10 @@ export async function saveCallLog(entry: any) {
db.prepare(
`
INSERT INTO call_logs (id, timestamp, method, path, status, model, provider,
account, connection_id, duration, tokens_in, tokens_out, source_format, target_format,
account, connection_id, duration, tokens_in, tokens_out, request_type, source_format, target_format,
api_key_id, api_key_name, combo_name, request_body, response_body, error)
VALUES (@id, @timestamp, @method, @path, @status, @model, @provider,
@account, @connectionId, @duration, @tokensIn, @tokensOut, @sourceFormat, @targetFormat,
@account, @connectionId, @duration, @tokensIn, @tokensOut, @requestType, @sourceFormat, @targetFormat,
@apiKeyId, @apiKeyName, @comboName, @requestBody, @responseBody, @error)
`
).run(logEntry);
+11 -4
View File
@@ -24,8 +24,7 @@ export const CALL_LOGS_DIR = isCloud ? null : path.join(DATA_DIR, "call_logs");
// Legacy paths
const LEGACY_DB_FILE =
isCloud || !LEGACY_DATA_DIR ? null : path.join(LEGACY_DATA_DIR, "usage.json");
const LEGACY_LOG_FILE =
isCloud || !LEGACY_DATA_DIR ? null : path.join(LEGACY_DATA_DIR, "log.txt");
const LEGACY_LOG_FILE = isCloud || !LEGACY_DATA_DIR ? null : path.join(LEGACY_DATA_DIR, "log.txt");
const LEGACY_CALL_LOGS_DB_FILE =
isCloud || !LEGACY_DATA_DIR ? null : path.join(LEGACY_DATA_DIR, "call_logs.json");
const LEGACY_CALL_LOGS_DIR =
@@ -82,10 +81,10 @@ export function migrateUsageJsonToSqlite() {
const insert = db.prepare(`
INSERT INTO usage_history (provider, model, connection_id, api_key_id, api_key_name,
tokens_input, tokens_output, tokens_cache_read, tokens_cache_creation, tokens_reasoning,
status, timestamp)
status, success, latency_ms, ttft_ms, error_code, timestamp)
VALUES (@provider, @model, @connectionId, @apiKeyId, @apiKeyName,
@tokensInput, @tokensOutput, @tokensCacheRead, @tokensCacheCreation, @tokensReasoning,
@status, @timestamp)
@status, @success, @latencyMs, @ttftMs, @errorCode, @timestamp)
`);
const tx = db.transaction(() => {
@@ -103,6 +102,14 @@ export function migrateUsageJsonToSqlite() {
entry.tokens?.cacheCreation ?? entry.tokens?.cache_creation_input_tokens ?? 0,
tokensReasoning: entry.tokens?.reasoning ?? entry.tokens?.reasoning_tokens ?? 0,
status: entry.status || null,
success: entry.success === false ? 0 : 1,
latencyMs: Number.isFinite(Number(entry.latencyMs)) ? Number(entry.latencyMs) : 0,
ttftMs: Number.isFinite(Number(entry.timeToFirstTokenMs))
? Number(entry.timeToFirstTokenMs)
: Number.isFinite(Number(entry.latencyMs))
? Number(entry.latencyMs)
: 0,
errorCode: entry.errorCode || null,
timestamp: entry.timestamp || new Date().toISOString(),
});
}
+167 -2
View File
@@ -29,6 +29,20 @@ function toNumber(value: unknown): number {
return 0;
}
function percentile(sortedValues: number[], p: number): number {
if (sortedValues.length === 0) return 0;
if (sortedValues.length === 1) return sortedValues[0];
const bounded = Math.max(0, Math.min(1, p));
const idx = Math.round((sortedValues.length - 1) * bounded);
return sortedValues[idx] ?? sortedValues[sortedValues.length - 1];
}
function stdDev(values: number[], avg: number): number {
if (values.length <= 1) return 0;
const variance = values.reduce((acc, v) => acc + (v - avg) ** 2, 0) / values.length;
return Math.sqrt(Math.max(0, variance));
}
// ──────────────── Pending Requests (in-memory) ────────────────
const pendingRequests: {
@@ -107,6 +121,10 @@ export async function getUsageDb() {
reasoning: toNumber(r.tokens_reasoning),
},
status: toStringOrNull(r.status),
success: toNumber(r.success) === 1,
latencyMs: toNumber(r.latency_ms),
timeToFirstTokenMs: toNumber(r.ttft_ms),
errorCode: toStringOrNull(r.error_code),
timestamp: toStringOrNull(r.timestamp),
};
});
@@ -130,8 +148,8 @@ export async function saveRequestUsage(entry: any) {
`
INSERT INTO usage_history (provider, model, connection_id, api_key_id, api_key_name,
tokens_input, tokens_output, tokens_cache_read, tokens_cache_creation, tokens_reasoning,
status, timestamp)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
status, success, latency_ms, ttft_ms, error_code, timestamp)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
`
).run(
entry.provider || null,
@@ -145,6 +163,14 @@ export async function saveRequestUsage(entry: any) {
entry.tokens?.cacheCreation ?? entry.tokens?.cache_creation_input_tokens ?? 0,
entry.tokens?.reasoning ?? entry.tokens?.reasoning_tokens ?? 0,
entry.status || null,
entry.success === false ? 0 : 1,
Number.isFinite(Number(entry.latencyMs)) ? Number(entry.latencyMs) : 0,
Number.isFinite(Number(entry.timeToFirstTokenMs))
? Number(entry.timeToFirstTokenMs)
: Number.isFinite(Number(entry.latencyMs))
? Number(entry.latencyMs)
: 0,
entry.errorCode || null,
timestamp
);
} catch (error) {
@@ -202,11 +228,150 @@ export async function getUsageHistory(filter: any = {}) {
reasoning: toNumber(r.tokens_reasoning),
},
status: toStringOrNull(r.status),
success: toNumber(r.success) === 1,
latencyMs: toNumber(r.latency_ms),
timeToFirstTokenMs: toNumber(r.ttft_ms),
errorCode: toStringOrNull(r.error_code),
timestamp: toStringOrNull(r.timestamp),
};
});
}
export interface ModelLatencyStatsEntry {
provider: string;
model: string;
key: string;
totalRequests: number;
successfulRequests: number;
successRate: number; // 0..1
avgLatencyMs: number;
p50LatencyMs: number;
p95LatencyMs: number;
p99LatencyMs: number;
latencyStdDev: number;
windowHours: number;
}
/**
* Aggregate rolling latency stats per provider/model from usage_history.
* Used by auto-combo routing to incorporate real-world latency and reliability.
*/
export async function getModelLatencyStats(
options: { windowHours?: number; minSamples?: number; maxRows?: number } = {}
): Promise<Record<string, ModelLatencyStatsEntry>> {
const windowHours =
Number.isFinite(Number(options.windowHours)) && Number(options.windowHours) > 0
? Number(options.windowHours)
: 24;
const minSamples =
Number.isFinite(Number(options.minSamples)) && Number(options.minSamples) > 0
? Number(options.minSamples)
: 1;
const maxRows =
Number.isFinite(Number(options.maxRows)) && Number(options.maxRows) > 0
? Number(options.maxRows)
: 10000;
const db = getDbInstance();
const sinceIso = new Date(Date.now() - windowHours * 60 * 60 * 1000).toISOString();
type LatencyRow = {
provider: string | null;
model: string | null;
success: number | null;
latency_ms: number | null;
};
const rows = db
.prepare(
`
SELECT provider, model, success, latency_ms
FROM usage_history
WHERE timestamp >= @sinceIso
AND provider IS NOT NULL
AND model IS NOT NULL
ORDER BY timestamp DESC
LIMIT @maxRows
`
)
.all({ sinceIso, maxRows }) as LatencyRow[];
const grouped = new Map<
string,
{
provider: string;
model: string;
totalRequests: number;
successfulRequests: number;
successfulLatencies: number[];
allLatencies: number[];
}
>();
for (const row of rows) {
const provider = toStringOrNull(row.provider);
const model = toStringOrNull(row.model);
if (!provider || !model) continue;
const key = `${provider}/${model}`;
if (!grouped.has(key)) {
grouped.set(key, {
provider,
model,
totalRequests: 0,
successfulRequests: 0,
successfulLatencies: [],
allLatencies: [],
});
}
const bucket = grouped.get(key);
if (!bucket) continue;
bucket.totalRequests += 1;
const isSuccess = toNumber(row.success) !== 0;
if (isSuccess) bucket.successfulRequests += 1;
const latency = toNumber(row.latency_ms);
if (latency > 0) {
bucket.allLatencies.push(latency);
if (isSuccess) bucket.successfulLatencies.push(latency);
}
}
const stats: Record<string, ModelLatencyStatsEntry> = {};
for (const [key, bucket] of grouped.entries()) {
const baseLatencies =
bucket.successfulLatencies.length >= minSamples
? bucket.successfulLatencies
: bucket.allLatencies;
if (baseLatencies.length < minSamples) continue;
const sorted = [...baseLatencies].sort((a, b) => a - b);
const avg = sorted.reduce((acc, n) => acc + n, 0) / sorted.length;
const successRate =
bucket.totalRequests > 0 ? bucket.successfulRequests / bucket.totalRequests : 0;
stats[key] = {
provider: bucket.provider,
model: bucket.model,
key,
totalRequests: bucket.totalRequests,
successfulRequests: bucket.successfulRequests,
successRate,
avgLatencyMs: Math.round(avg),
p50LatencyMs: Math.round(percentile(sorted, 0.5)),
p95LatencyMs: Math.round(percentile(sorted, 0.95)),
p99LatencyMs: Math.round(percentile(sorted, 0.99)),
latencyStdDev: Math.round(stdDev(sorted, avg)),
windowHours,
};
}
return stats;
}
// ──────────────── Request Log (log.txt) ────────────────
import fs from "fs";
+2 -6
View File
@@ -23,6 +23,7 @@ export {
getUsageDb,
saveRequestUsage,
getUsageHistory,
getModelLatencyStats,
appendRequestLog,
getRecentLogs,
} from "./usage/usageHistory";
@@ -31,9 +32,4 @@ export { calculateCost } from "./usage/costCalculator";
export { getUsageStats } from "./usage/usageStats";
export {
saveCallLog,
rotateCallLogs,
getCallLogs,
getCallLogById,
} from "./usage/callLogs";
export { saveCallLog, rotateCallLogs, getCallLogs, getCallLogById } from "./usage/callLogs";
+54
View File
@@ -0,0 +1,54 @@
/**
* Kiro IDE MITM Configuration (#336)
*
* Kiro IDE removed the Base URL / API Key configuration UI.
* To route Kiro's traffic through OmniRoute, we intercept it using MITM,
* similar to the existing Antigravity/Claude Code implementation.
*
* Kiro IDE uses the Anthropic API at https://api.anthropic.com:
* - Main endpoint: POST /v1/messages
* - Auth header: x-api-key: <key>
* - User-Agent contains: "kiro" or "Kiro"
*
* To use: Install OmniRoute's MITM certificate, then run:
* omniroute mitm start --targets kiro
*
* The MITM server intercepts requests to api.anthropic.com and forwards
* them to the OmniRoute proxy (localhost:20128) instead.
*/
export interface MitmTarget {
id: string;
name: string;
description: string;
targetHost: string;
targetPort: number;
localPort: number;
userAgentPattern: string | null;
apiEndpoints: string[];
authHeader: string;
instructions: string[];
referenceIde?: string;
}
/** Kiro IDE MITM profile */
export const KIRO_MITM_PROFILE: MitmTarget = {
id: "kiro",
name: "Kiro IDE",
description:
"Intercepts Kiro IDE requests to api.anthropic.com and routes them through OmniRoute.",
targetHost: "api.anthropic.com",
targetPort: 443,
localPort: 20130,
userAgentPattern: null, // Kiro does not expose a stable User-Agent
apiEndpoints: ["/v1/messages"],
authHeader: "x-api-key",
instructions: [
"1. Install OmniRoute's root certificate: run `omniroute cert install` or go to Settings → MITM Certificates",
"2. Start the MITM proxy: `omniroute mitm start --target kiro`",
"3. Set your system HTTP proxy to 127.0.0.1:20130 (or use transparent MITM via DNS override)",
"4. Open Kiro IDE — API calls will be automatically routed through OmniRoute.",
"5. Verify: check the Proxy Logs in OmniRoute dashboard and look for provider=anthropic source=mitm",
],
referenceIde: "antigravity", // Same MITM infrastructure as Antigravity
};
+5 -5
View File
@@ -258,7 +258,7 @@ export default function RequestLoggerV2() {
onClick={() => setRecording(!recording)}
className={`flex items-center gap-2 px-3 py-1.5 rounded-full text-sm font-medium border transition-colors ${
recording
? "bg-red-500/10 border-red-500/30 text-red-400"
? "bg-red-500/10 border-red-500/30 text-red-700 dark:text-red-400"
: "bg-bg-subtle border-border text-text-muted"
}`}
>
@@ -413,11 +413,11 @@ export default function RequestLoggerV2() {
className={`flex items-center gap-1.5 px-3 py-1 rounded-full text-xs font-medium border transition-all ${
activeFilter === f.key
? f.key === "error"
? "bg-red-500/20 text-red-400 border-red-500/40"
? "bg-red-500/20 text-red-700 dark:text-red-400 border-red-500/40"
: f.key === "ok"
? "bg-emerald-500/20 text-emerald-400 border-emerald-500/40"
? "bg-emerald-500/20 text-emerald-700 dark:text-emerald-400 border-emerald-500/40"
: f.key === "combo"
? "bg-violet-500/20 text-violet-300 border-violet-500/40"
? "bg-violet-500/20 text-violet-700 dark:text-violet-300 border-violet-500/40"
: "bg-primary text-white border-primary"
: "bg-bg-subtle border-border text-text-muted hover:border-text-muted"
}`}
@@ -635,7 +635,7 @@ export default function RequestLoggerV2() {
{visibleColumns.combo && (
<td className="px-3 py-2">
{log.comboName ? (
<span className="inline-block px-2 py-0.5 rounded-full text-[9px] font-bold bg-violet-500/20 text-violet-700 dark:text-violet-300 border border-violet-500/30">
<span className="inline-block px-2 py-0.5 rounded-full text-[9px] font-bold bg-violet-500/20 text-violet-800 dark:text-violet-300 border border-violet-500/40">
{log.comboName}
</span>
) : (
+292 -22
View File
@@ -7,6 +7,20 @@ export const DEFAULT_PRICING = {
// Claude Code (cc)
cc: {
"claude-opus-4-6": {
input: 5.0,
output: 25.0,
cached: 2.5,
reasoning: 25.0,
cache_creation: 5.0,
},
"claude-sonnet-4-6": {
input: 3.0,
output: 15.0,
cached: 1.5,
reasoning: 15.0,
cache_creation: 3.0,
},
"claude-opus-4-5-20251101": {
input: 15.0,
output: 75.0,
@@ -115,6 +129,13 @@ export const DEFAULT_PRICING = {
reasoning: 18.0,
cache_creation: 2.0,
},
"gemini-3.1-pro-preview": {
input: 2.0,
output: 12.0,
cached: 0.25,
reasoning: 18.0,
cache_creation: 2.0,
},
"gemini-2.5-pro": {
input: 2.0,
output: 12.0,
@@ -129,12 +150,13 @@ export const DEFAULT_PRICING = {
reasoning: 3.75,
cache_creation: 0.3,
},
// Gemini 2.5 Flash Lite — preco corrigido via ClawRouter: $0.10/$0.40 (era $0.15/$1.25)
"gemini-2.5-flash-lite": {
input: 0.15,
output: 1.25,
cached: 0.015,
reasoning: 1.875,
cache_creation: 0.15,
input: 0.1,
output: 0.4,
cached: 0.025,
reasoning: 0.6,
cache_creation: 0.1,
},
},
@@ -202,18 +224,25 @@ export const DEFAULT_PRICING = {
cache_creation: 0.75,
},
"deepseek-v3.2-chat": {
input: 0.5,
output: 2.0,
cached: 0.25,
reasoning: 3.0,
cache_creation: 0.5,
input: 0.28,
output: 0.42,
cached: 0.014,
reasoning: 0.63,
cache_creation: 0.28,
},
"deepseek-v3.2": {
input: 0.28,
output: 0.42,
cached: 0.014,
reasoning: 0.63,
cache_creation: 0.28,
},
"deepseek-v3.2-reasoner": {
input: 0.75,
output: 3.0,
cached: 0.375,
reasoning: 4.5,
cache_creation: 0.75,
input: 0.55,
output: 2.19,
cached: 0.14,
reasoning: 2.19,
cache_creation: 0.55,
},
// Short-form aliases used by decolua/9router catalog (Mar 2026)
"deepseek-3.1": {
@@ -451,10 +480,71 @@ export const DEFAULT_PRICING = {
reasoning: 15.0,
cache_creation: 3.0,
},
// Claude 4.5 Haiku — modelo eco mais recente da Anthropic (2025-10)
"claude-haiku-4-5-20251001": {
input: 1.0,
output: 5.0,
cached: 0.5,
reasoning: 7.5,
cache_creation: 1.0,
},
"claude-haiku-4.5": {
input: 1.0,
output: 5.0,
cached: 0.5,
reasoning: 7.5,
cache_creation: 1.0,
},
// Claude Sonnet 4.6 — maxOutput 64k tokens, $3/$15/M
"claude-sonnet-4-6-20251031": {
input: 3.0,
output: 15.0,
cached: 1.5,
reasoning: 22.5,
cache_creation: 3.0,
},
"claude-sonnet-4.6": {
input: 3.0,
output: 15.0,
cached: 1.5,
reasoning: 22.5,
cache_creation: 3.0,
},
// Claude Opus 4.6 — mais barato que Opus 4 ($5/$25 vs $15/$75)
"claude-opus-4-6-20251031": {
input: 5.0,
output: 25.0,
cached: 2.5,
reasoning: 37.5,
cache_creation: 5.0,
},
"claude-opus-4.6": {
input: 5.0,
output: 25.0,
cached: 2.5,
reasoning: 37.5,
cache_creation: 5.0,
},
},
// Gemini
gemini: {
// Gemini 3.1 Pro — novo flagship Google (2026-03-17)
// Context: 1.050.000 tokens | Max Output: 65.536
"gemini-3.1-pro": {
input: 2.0,
output: 12.0,
cached: 0.25,
reasoning: 18.0,
cache_creation: 2.0,
},
"gemini-3-1-pro": {
input: 2.0,
output: 12.0,
cached: 0.25,
reasoning: 18.0,
cache_creation: 2.0,
},
"gemini-3-pro-preview": {
input: 2.0,
output: 12.0,
@@ -462,6 +552,13 @@ export const DEFAULT_PRICING = {
reasoning: 18.0,
cache_creation: 2.0,
},
"gemini-3.1-pro-preview": {
input: 2.0,
output: 12.0,
cached: 0.25,
reasoning: 18.0,
cache_creation: 2.0,
},
"gemini-2.5-pro": {
input: 2.0,
output: 12.0,
@@ -476,12 +573,53 @@ export const DEFAULT_PRICING = {
reasoning: 3.75,
cache_creation: 0.3,
},
// Gemini 2.5 Flash Lite — preco corrigido: $0.10/$0.40 (ClawRouter)
"gemini-2.5-flash-lite": {
input: 0.15,
output: 1.25,
cached: 0.015,
reasoning: 1.875,
cache_creation: 0.15,
input: 0.1,
output: 0.4,
cached: 0.025,
reasoning: 0.6,
cache_creation: 0.1,
},
},
// DeepSeek — API nativa (V3.2 Chat), separada de free providers
// Preco: $0.28/$0.42/M tokens (verificado via ClawRouter 2026-03-17)
deepseek: {
"deepseek-chat": {
input: 0.28,
output: 0.42,
cached: 0.014,
reasoning: 0.42,
cache_creation: 0.28,
},
"deepseek-v3": {
input: 0.28,
output: 0.42,
cached: 0.014,
reasoning: 0.42,
cache_creation: 0.28,
},
"deepseek-v3.2": {
input: 0.28,
output: 0.42,
cached: 0.014,
reasoning: 0.42,
cache_creation: 0.28,
},
"deepseek-reasoner": {
input: 0.55,
output: 2.19,
cached: 0.14,
reasoning: 2.19,
cache_creation: 0.55,
},
"deepseek-r1": {
input: 0.55,
output: 2.19,
cached: 0.14,
reasoning: 2.19,
cache_creation: 0.55,
},
},
@@ -498,6 +636,20 @@ export const DEFAULT_PRICING = {
// GLM
glm: {
"glm-5": {
input: 1.0,
output: 3.2,
cached: 0.5,
reasoning: 4.8,
cache_creation: 1.0,
},
"glm-5-turbo": {
input: 1.2,
output: 4.0,
cached: 0.6,
reasoning: 6.0,
cache_creation: 1.2,
},
"glm-4.7": {
input: 0.75,
output: 3.0,
@@ -521,7 +673,7 @@ export const DEFAULT_PRICING = {
},
},
// Kimi
// Kimi (Moonshot)
kimi: {
"kimi-latest": {
input: 1.0,
@@ -530,10 +682,33 @@ export const DEFAULT_PRICING = {
reasoning: 6.0,
cache_creation: 1.0,
},
// Kimi K2.5 — acesso direto via Moonshot API
// Context: 262.144 tokens | Capabilities: reasoning, vision, agentic, tools
"kimi-k2.5": {
input: 0.6,
output: 3.0,
cached: 0.3,
reasoning: 4.5,
cache_creation: 0.6,
},
"moonshot-kimi-k2.5": {
input: 0.6,
output: 3.0,
cached: 0.3,
reasoning: 4.5,
cache_creation: 0.6,
},
},
// MiniMax
minimax: {
"minimax-m2.1": {
input: 0.5,
output: 2.0,
cached: 0.25,
reasoning: 3.0,
cache_creation: 0.5,
},
"MiniMax-M2.1": {
input: 0.5,
output: 2.0,
@@ -541,6 +716,22 @@ export const DEFAULT_PRICING = {
reasoning: 3.0,
cache_creation: 0.5,
},
// MiniMax M2.5 — mais barato que M2.1, reasoning + tools
// Context: 204.800 tokens | Max Output: 16.384 tokens
"minimax-m2.5": {
input: 0.3,
output: 1.2,
cached: 0.15,
reasoning: 1.8,
cache_creation: 0.3,
},
"MiniMax-M2.5": {
input: 0.3,
output: 1.2,
cached: 0.15,
reasoning: 1.8,
cache_creation: 0.3,
},
},
// ─── Free-tier API Key Providers (nominal $0 pricing) ───
@@ -627,6 +818,7 @@ export const DEFAULT_PRICING = {
// Nvidia
nvidia: {
"nvidia/gpt-oss-120b": { input: 0, output: 0, cached: 0, reasoning: 0, cache_creation: 0 },
"openai/gpt-oss-120b": { input: 0, output: 0, cached: 0, reasoning: 0, cache_creation: 0 },
"gpt-oss-120b": { input: 0, output: 0, cached: 0, reasoning: 0, cache_creation: 0 },
"moonshotai/kimi-k2.5": { input: 0, output: 0, cached: 0, reasoning: 0, cache_creation: 0 },
@@ -757,7 +949,85 @@ export const DEFAULT_PRICING = {
},
},
// Kiro (AWS)
// ─────────────────────────────────────────────────────────────────────
// xAI (Grok) — Grok-3 + Grok-4 Family
// Source: ClawRouter benchmarks 2026-03-17
// Grok-4-fast-non-reasoning: 1143ms P50 (mais rapido do benchmark)
// ─────────────────────────────────────────────────────────────────────
xai: {
"grok-3": {
input: 3.0,
output: 15.0,
cached: 1.5,
reasoning: 22.5,
cache_creation: 3.0,
},
"grok-3-mini": {
input: 0.3,
output: 0.5,
cached: 0.15,
reasoning: 0.75,
cache_creation: 0.3,
},
// Grok-4 Fast Family — ultrabaratos ($0.20/$0.50/M)
"grok-4-fast-non-reasoning": {
input: 0.2,
output: 0.5,
cached: 0.1,
reasoning: 0.0,
cache_creation: 0.2,
},
"grok-4-fast-reasoning": {
input: 0.2,
output: 0.5,
cached: 0.1,
reasoning: 0.75,
cache_creation: 0.2,
},
"grok-4-1-fast-non-reasoning": {
input: 0.2,
output: 0.5,
cached: 0.1,
reasoning: 0.0,
cache_creation: 0.2,
},
"grok-4-1-fast-reasoning": {
input: 0.2,
output: 0.5,
cached: 0.1,
reasoning: 0.75,
cache_creation: 0.2,
},
"grok-4-0709": {
input: 0.2,
output: 1.5,
cached: 0.1,
reasoning: 2.25,
cache_creation: 0.2,
},
},
// ─────────────────────────────────────────────────────────────────────
// Z.AI / ZhipuAI — GLM-5 Family
// Adicionados via ClawRouter 2026-03-17 | maxOutput: 128k tokens!
// ─────────────────────────────────────────────────────────────────────
zai: {
"glm-5": {
input: 1.0,
output: 3.2,
cached: 0.5,
reasoning: 4.8,
cache_creation: 1.0,
},
"glm-5-turbo": {
input: 1.2,
output: 4.0,
cached: 0.6,
reasoning: 6.0,
cache_creation: 1.2,
},
},
kiro: {
"claude-sonnet-4.5": {
input: 3.0,
+60
View File
@@ -390,6 +390,66 @@ export const APIKEY_PROVIDERS = {
website: "https://cloud.google.com/vertex-ai",
authHint: "Provide Service Account JSON or OAuth access_token",
},
zai: {
id: "zai",
alias: "zai",
name: "Z.AI (GLM-5)",
icon: "psychology",
color: "#2563EB",
textIcon: "ZA",
website: "https://open.bigmodel.cn",
apiHint: "API key from https://open.bigmodel.cn/usercenter/apikeys",
},
"perplexity-search": {
id: "perplexity-search",
alias: "pplx-search",
name: "Perplexity Search",
icon: "search",
color: "#20808D",
textIcon: "PS",
website: "https://docs.perplexity.ai/guides/search-quickstart",
authHint: "Same API key as Perplexity (pplx-...)",
},
"serper-search": {
id: "serper-search",
alias: "serper-search",
name: "Serper Search",
icon: "search",
color: "#4285F4",
textIcon: "SP",
website: "https://serper.dev",
authHint: "API key from serper.dev dashboard",
},
"brave-search": {
id: "brave-search",
alias: "brave-search",
name: "Brave Search",
icon: "travel_explore",
color: "#FB542B",
textIcon: "BR",
website: "https://brave.com/search/api",
authHint: "Subscription token from Brave Search API dashboard",
},
"exa-search": {
id: "exa-search",
alias: "exa-search",
name: "Exa Search",
icon: "neurology",
color: "#1E40AF",
textIcon: "EX",
website: "https://exa.ai",
authHint: "API key from dashboard.exa.ai",
},
"tavily-search": {
id: "tavily-search",
alias: "tavily-search",
name: "Tavily Search",
icon: "manage_search",
color: "#5B4FDB",
textIcon: "TV",
website: "https://tavily.com",
authHint: "API key from app.tavily.com (format: tvly-...)",
},
};
export const OPENAI_COMPATIBLE_PREFIX = "openai-compatible-";
+141
View File
@@ -52,6 +52,7 @@ const comboStrategySchema = z.enum([
"least-used",
"cost-optimized",
"strict-random",
"auto",
]);
const comboRuntimeConfigSchema = z
@@ -139,6 +140,12 @@ export const updateSettingsSchema = z.object({
.optional(),
wildcardAliases: z.array(z.object({ pattern: z.string(), target: z.string() })).optional(),
stickyRoundRobinLimit: z.number().int().min(0).max(1000).optional(),
// Auto intent classifier settings (multilingual routing)
intentDetectionEnabled: z.boolean().optional(),
intentSimpleMaxWords: z.number().int().min(1).max(500).optional(),
intentExtraCodeKeywords: z.array(z.string().max(100)).optional(),
intentExtraReasoningKeywords: z.array(z.string().max(100)).optional(),
intentExtraSimpleKeywords: z.array(z.string().max(100)).optional(),
// Protocol toggles (default: disabled)
mcpEnabled: z.boolean().optional(),
a2aEnabled: z.boolean().optional(),
@@ -1074,3 +1081,137 @@ export const guideSettingsSaveSchema = z.object({
apiKey: z.string().optional(),
model: z.string().trim().min(1, "Model is required"),
});
// ── Search Schemas ─────────────────────────────────────────────────────
// Unified search request/response schemas. Final contract — all fields optional
// with defaults. New features add implementations, not new fields.
// Multi-query deferred to POST /v1/search/batch (separate PRD).
export const v1SearchSchema = z
.object({
// Core
query: z
.string()
.trim()
.min(1, "Query is required")
.max(500, "Query must be 500 characters or fewer"),
provider: z
.enum(["serper-search", "brave-search", "perplexity-search", "exa-search", "tavily-search"])
.optional(),
max_results: z.coerce.number().int().min(1).max(100).default(5),
search_type: z.enum(["web", "news"]).default("web"),
offset: z.coerce.number().int().min(0).default(0),
// Locale
country: z.string().max(2).toUpperCase().optional(),
language: z.string().min(2).max(5).optional(),
time_range: z.enum(["any", "day", "week", "month", "year"]).optional(),
// Content control
content: z
.object({
snippet: z.boolean().default(true),
full_page: z.boolean().default(false),
format: z.enum(["text", "markdown"]).default("text"),
max_characters: z.coerce.number().int().min(100).max(100000).optional(),
})
.optional(),
// Filters
filters: z
.object({
include_domains: z.array(z.string().max(253)).max(20).optional(),
exclude_domains: z.array(z.string().max(253)).max(20).optional(),
safe_search: z.enum(["off", "moderate", "strict"]).optional(),
})
.optional(),
// Answer synthesis (Phase 2 — returns null until implemented)
synthesis: z
.object({
strategy: z.enum(["none", "auto", "provider", "internal"]).default("none"),
model: z.string().optional(),
max_tokens: z.coerce.number().int().min(1).max(4000).optional(),
})
.optional(),
// Provider-specific passthrough
provider_options: z.record(z.string(), z.unknown()).optional(),
// Strict mode — reject if provider doesn't support a requested filter
strict_filters: z.boolean().default(false),
})
.catchall(z.unknown());
export const searchResultSchema = z.object({
title: z.string(),
url: z.string(),
display_url: z.string().optional(),
snippet: z.string(),
position: z.number().int().positive(),
score: z.number().min(0).max(1).nullable().optional(),
published_at: z.string().nullable().optional(),
favicon_url: z.string().nullable().optional(),
content: z
.object({
format: z.enum(["text", "markdown"]).optional(),
text: z.string().optional(),
length: z.number().int().optional(),
})
.nullable()
.optional(),
metadata: z
.object({
author: z.string().nullable().optional(),
language: z.string().nullable().optional(),
source_type: z
.enum(["article", "blog", "forum", "video", "academic", "news", "other"])
.nullable()
.optional(),
image_url: z.string().nullable().optional(),
})
.nullable()
.optional(),
citation: z.object({
provider: z.string(),
retrieved_at: z.string(),
rank: z.number().int().positive(),
}),
provider_raw: z.record(z.string(), z.unknown()).nullable().optional(),
});
export const v1SearchResponseSchema = z.object({
id: z.string(),
provider: z.string(),
query: z.string(),
results: z.array(searchResultSchema),
cached: z.boolean(),
answer: z
.object({
source: z.enum(["none", "provider", "internal"]).optional(),
text: z.string().nullable().optional(),
model: z.string().nullable().optional(),
})
.nullable()
.optional(),
usage: z.object({
queries_used: z.number().int().min(0),
search_cost_usd: z.number().min(0),
llm_tokens: z.number().int().min(0).optional(),
}),
metrics: z.object({
response_time_ms: z.number().int().min(0),
upstream_latency_ms: z.number().int().min(0).optional(),
gateway_latency_ms: z.number().int().min(0).optional(),
total_results_available: z.number().int().nullable(),
}),
errors: z
.array(
z.object({
provider: z.string(),
code: z.string(),
message: z.string(),
})
)
.optional(),
});
+6
View File
@@ -30,6 +30,12 @@ export const updateSettingsSchema = z.object({
.optional(),
wildcardAliases: z.array(z.object({ pattern: z.string(), target: z.string() })).optional(),
stickyRoundRobinLimit: z.number().int().min(0).max(1000).optional(),
// Auto intent classifier settings (multilingual routing)
intentDetectionEnabled: z.boolean().optional(),
intentSimpleMaxWords: z.number().int().min(1).max(500).optional(),
intentExtraCodeKeywords: z.array(z.string().max(100)).optional(),
intentExtraReasoningKeywords: z.array(z.string().max(100)).optional(),
intentExtraSimpleKeywords: z.array(z.string().max(100)).optional(),
// Protocol toggles (default: disabled)
mcpEnabled: z.boolean().optional(),
mcpTransport: z.enum(["stdio", "sse", "streamable-http"]).optional(),
+53 -1
View File
@@ -46,6 +46,10 @@ import {
applyTaskAwareRouting,
getTaskRoutingConfig,
} from "@omniroute/open-sse/services/taskAwareRouter.ts";
import {
isFallbackDecision,
shouldUseFallback,
} from "@omniroute/open-sse/services/emergencyFallback.ts";
/**
* Handle chat completion request
@@ -270,7 +274,8 @@ async function handleSingleModelChat(
request: any = null,
comboName: string | null = null,
apiKeyInfo: any = null,
telemetry: any = null
telemetry: any = null,
runtimeOptions: { emergencyFallbackTried?: boolean } = {}
) {
// 1. Resolve model → provider/model
const resolved = await resolveModelOrError(modelStr, body);
@@ -372,6 +377,53 @@ async function handleSingleModelChat(
return result.response;
}
// Emergency fallback for budget exhaustion (402 / billing / quota keywords):
// reroute to a free model (default provider/model: nvidia + openai/gpt-oss-120b) exactly once.
if (!runtimeOptions.emergencyFallbackTried) {
const fallbackDecision = shouldUseFallback(
Number(result.status || 0),
String(result.error || ""),
Array.isArray(body?.tools) && body.tools.length > 0
);
if (isFallbackDecision(fallbackDecision)) {
const fallbackModelStr = `${fallbackDecision.provider}/${fallbackDecision.model}`;
const currentModelStr = `${provider}/${model}`;
if (fallbackModelStr !== currentModelStr) {
const fallbackBody = { ...body, model: fallbackModelStr };
// Cap output on emergency fallback to avoid unexpected long responses.
const maxTokens = Math.min(
Number(
fallbackBody.max_tokens ??
fallbackBody.max_completion_tokens ??
fallbackDecision.maxOutputTokens
) || fallbackDecision.maxOutputTokens,
fallbackDecision.maxOutputTokens
);
fallbackBody.max_tokens = maxTokens;
fallbackBody.max_completion_tokens = maxTokens;
log.warn(
"EMERGENCY_FALLBACK",
`${currentModelStr} -> ${fallbackModelStr} | reason=${fallbackDecision.reason}`
);
return handleSingleModelChat(
fallbackBody,
fallbackModelStr,
clientRawRequest,
request,
comboName,
apiKeyInfo,
telemetry,
{ ...runtimeOptions, emergencyFallbackTried: true }
);
}
}
}
// 6. Mark account as quota-exhausted on 429 response
if (result.status === 429) {
markAccountExhaustedFrom429(credentials.connectionId, provider);
+277
View File
@@ -0,0 +1,277 @@
import test from "node:test";
import assert from "node:assert/strict";
// ═══════════════════════════════════════════════════════════════
// Search Registry + Cache Unit Tests
// Tests for searchRegistry, searchCache, and response normalization
// ═══════════════════════════════════════════════════════════════
const { SEARCH_PROVIDERS, getSearchProvider, getAllSearchProviders, selectProvider } =
await import("../../open-sse/config/searchRegistry.ts");
const { computeCacheKey, getOrCoalesce, getCacheStats, SEARCH_CACHE_DEFAULT_TTL_MS } =
await import("../../open-sse/services/searchCache.ts");
// ─── Registry Tests ──────────────────────────────────────────
test("SEARCH_PROVIDERS has all 5 providers", () => {
assert.ok(SEARCH_PROVIDERS["serper-search"], "serper should exist");
assert.ok(SEARCH_PROVIDERS["brave-search"], "brave should exist");
assert.ok(SEARCH_PROVIDERS["perplexity-search"], "perplexity-search should exist");
assert.ok(SEARCH_PROVIDERS["exa-search"], "exa should exist");
assert.ok(SEARCH_PROVIDERS["tavily-search"], "tavily should exist");
assert.equal(Object.keys(SEARCH_PROVIDERS).length, 5);
});
test("serper-search config is correct", () => {
const s = SEARCH_PROVIDERS["serper-search"];
assert.equal(s.id, "serper-search");
assert.equal(s.method, "POST");
assert.equal(s.authHeader, "x-api-key");
assert.equal(s.costPerQuery, 0.001);
assert.equal(s.freeMonthlyQuota, 2500);
assert.deepEqual(s.searchTypes, ["web", "news"]);
});
test("brave-search config is correct", () => {
const b = SEARCH_PROVIDERS["brave-search"];
assert.equal(b.id, "brave-search");
assert.equal(b.method, "GET");
assert.equal(b.authHeader, "x-subscription-token");
assert.equal(b.costPerQuery, 0.005);
assert.equal(b.freeMonthlyQuota, 1000);
});
test("perplexity-search config is correct", () => {
const p = SEARCH_PROVIDERS["perplexity-search"];
assert.equal(p.id, "perplexity-search");
assert.equal(p.method, "POST");
assert.equal(p.authHeader, "bearer");
assert.equal(p.baseUrl, "https://api.perplexity.ai/search");
assert.equal(p.costPerQuery, 0.005);
assert.equal(p.freeMonthlyQuota, 0);
assert.deepEqual(p.searchTypes, ["web"]);
});
test("getSearchProvider returns config for valid ID", () => {
const config = getSearchProvider("serper-search");
assert.ok(config);
assert.equal(config.id, "serper-search");
});
test("getSearchProvider returns null for unknown ID", () => {
assert.equal(getSearchProvider("unknown"), null);
});
test("tavily config is correct", () => {
const t = SEARCH_PROVIDERS["tavily-search"];
assert.equal(t.id, "tavily-search");
assert.equal(t.method, "POST");
assert.equal(t.authHeader, "bearer");
assert.equal(t.baseUrl, "https://api.tavily.com/search");
assert.equal(t.costPerQuery, 0.008);
assert.equal(t.freeMonthlyQuota, 1000);
assert.deepEqual(t.searchTypes, ["web", "news"]);
});
test("getAllSearchProviders returns flat list", () => {
const all = getAllSearchProviders();
assert.equal(all.length, 5);
assert.ok(all.some((p) => p.id === "serper-search"));
assert.ok(all.some((p) => p.id === "brave-search"));
assert.ok(all.some((p) => p.id === "perplexity-search"));
assert.ok(all.some((p) => p.id === "exa-search"));
assert.ok(all.some((p) => p.id === "tavily-search"));
// Each entry should have id, name, searchTypes
for (const p of all) {
assert.ok(p.id);
assert.ok(p.name);
assert.ok(Array.isArray(p.searchTypes));
}
});
test("selectProvider with explicit provider returns that provider", () => {
const config = selectProvider("brave-search");
assert.ok(config);
assert.equal(config.id, "brave-search");
});
test("selectProvider with unknown provider returns null", () => {
assert.equal(selectProvider("unknown"), null);
});
test("selectProvider without argument returns cheapest (serper)", () => {
const config = selectProvider();
assert.ok(config);
assert.equal(config.id, "serper-search"); // $0.001 < $0.005
});
// ─── Cache Key Tests ─────────────────────────────────────────
test("computeCacheKey is deterministic", () => {
const k1 = computeCacheKey("hello world", "auto", "web", 5);
const k2 = computeCacheKey("hello world", "auto", "web", 5);
assert.equal(k1, k2);
});
test("computeCacheKey normalizes query (case, whitespace)", () => {
const k1 = computeCacheKey("Hello World", "auto", "web", 5);
const k2 = computeCacheKey("hello world", "auto", "web", 5);
assert.equal(k1, k2);
});
test("computeCacheKey differs by provider", () => {
const k1 = computeCacheKey("test", "serper", "web", 5);
const k2 = computeCacheKey("test", "brave", "web", 5);
assert.notEqual(k1, k2);
});
test("computeCacheKey differs by search_type", () => {
const k1 = computeCacheKey("test", "auto", "web", 5);
const k2 = computeCacheKey("test", "auto", "news", 5);
assert.notEqual(k1, k2);
});
test("computeCacheKey differs by max_results", () => {
const k1 = computeCacheKey("test", "auto", "web", 5);
const k2 = computeCacheKey("test", "auto", "web", 10);
assert.notEqual(k1, k2);
});
// ─── Cache + Coalescing Tests ────────────────────────────────
test("getOrCoalesce caches and returns on second call", async () => {
let callCount = 0;
const key = "test-cache-hit-" + Date.now();
const r1 = await getOrCoalesce(key, 60_000, async () => {
callCount++;
return { value: 42 };
});
assert.equal(r1.cached, false);
assert.deepEqual(r1.data, { value: 42 });
const r2 = await getOrCoalesce(key, 60_000, async () => {
callCount++;
return { value: 99 };
});
assert.equal(r2.cached, true);
assert.deepEqual(r2.data, { value: 42 }); // original value, not 99
assert.equal(callCount, 1); // fetchFn called only once
});
test("getOrCoalesce coalesces concurrent requests", async () => {
let callCount = 0;
const key = "test-coalesce-" + Date.now();
const fetchFn = async () => {
callCount++;
await new Promise((r) => setTimeout(r, 50)); // simulate async
return { value: "coalesced" };
};
// Launch 3 concurrent requests with the same key
const [r1, r2, r3] = await Promise.all([
getOrCoalesce(key, 60_000, fetchFn),
getOrCoalesce(key, 60_000, fetchFn),
getOrCoalesce(key, 60_000, fetchFn),
]);
assert.equal(callCount, 1); // Only one fetch executed
assert.deepEqual(r1.data, { value: "coalesced" });
assert.deepEqual(r2.data, { value: "coalesced" });
assert.deepEqual(r3.data, { value: "coalesced" });
});
test("getOrCoalesce respects TTL=0 (no caching)", async () => {
let callCount = 0;
const key = "test-no-cache-" + Date.now();
await getOrCoalesce(key, 0, async () => {
callCount++;
return { value: 1 };
});
await getOrCoalesce(key, 0, async () => {
callCount++;
return { value: 2 };
});
assert.equal(callCount, 2); // Both calls executed
});
test("getCacheStats returns valid stats", () => {
const stats = getCacheStats();
assert.equal(typeof stats.size, "number");
assert.equal(typeof stats.hits, "number");
assert.equal(typeof stats.misses, "number");
});
test("SEARCH_CACHE_DEFAULT_TTL_MS is positive", () => {
assert.ok(SEARCH_CACHE_DEFAULT_TTL_MS > 0);
});
// ─── Validation Schema Tests ────────────────────────────────
test("v1SearchSchema validates correct input", async () => {
const { v1SearchSchema } = await import("../../src/shared/validation/schemas.ts");
const result = v1SearchSchema.safeParse({
query: "test query",
provider: "serper-search",
max_results: 10,
search_type: "web",
});
assert.ok(result.success);
assert.equal(result.data.query, "test query");
assert.equal(result.data.provider, "serper-search");
assert.equal(result.data.max_results, 10);
});
test("v1SearchSchema rejects empty query", async () => {
const { v1SearchSchema } = await import("../../src/shared/validation/schemas.ts");
const result = v1SearchSchema.safeParse({ query: "" });
assert.ok(!result.success);
});
test("v1SearchSchema rejects query over 500 chars", async () => {
const { v1SearchSchema } = await import("../../src/shared/validation/schemas.ts");
const result = v1SearchSchema.safeParse({ query: "a".repeat(501) });
assert.ok(!result.success);
});
test("v1SearchSchema rejects invalid provider", async () => {
const { v1SearchSchema } = await import("../../src/shared/validation/schemas.ts");
const result = v1SearchSchema.safeParse({ query: "test", provider: "google" });
assert.ok(!result.success);
});
test("v1SearchSchema accepts tavily provider", async () => {
const { v1SearchSchema } = await import("../../src/shared/validation/schemas.ts");
const result = v1SearchSchema.safeParse({ query: "test", provider: "tavily-search" });
assert.ok(result.success);
assert.equal(result.data.provider, "tavily-search");
});
test("v1SearchSchema applies defaults", async () => {
const { v1SearchSchema } = await import("../../src/shared/validation/schemas.ts");
const result = v1SearchSchema.safeParse({ query: "test" });
assert.ok(result.success);
assert.equal(result.data.max_results, 5);
assert.equal(result.data.search_type, "web");
assert.equal(result.data.provider, undefined);
});
test("v1SearchSchema allows unknown fields (forward compat)", async () => {
const { v1SearchSchema } = await import("../../src/shared/validation/schemas.ts");
const result = v1SearchSchema.safeParse({
query: "test",
future_field: true,
});
assert.ok(result.success);
});