Compare commits

...

97 Commits

Author SHA1 Message Date
diegosouzapw 659e2b414d feat(release): v2.8.2 — model alias routing fix, log export, 2 merged PRs
Build Electron Desktop App / Validate version (push) Failing after 25s
Build Electron Desktop App / Build Electron (macos-arm64) (push) Has been skipped
Build Electron Desktop App / Build Electron (linux) (push) Has been skipped
Build Electron Desktop App / Build Electron (macos-intel) (push) Has been skipped
Build Electron Desktop App / Build Electron (windows) (push) Has been skipped
Build Electron Desktop App / Create Release (push) Has been skipped
2026-03-19 11:13:49 -03:00
diegosouzapw 7bcb58e3db feat(logs): add export button with time range dropdown (1h, 6h, 12h, 24h)
- New API: /api/logs/export?hours=24&type=call-logs
- UI: Export button with dropdown on /dashboard/logs page
- Supports export of request-logs, proxy-logs, and call-logs
- Downloads as JSON file with Content-Disposition header
2026-03-19 11:11:07 -03:00
diegosouzapw 2d7d7776a6 fix(routing): model aliases now affect routing, not just format detection (#472)
Previously resolveModelAlias() output was used only for getModelTargetFormat()
but the original model was sent in translatedBody.model and to the executor.
Now effectiveModel is propagated to all downstream operations.
2026-03-19 11:07:29 -03:00
Prakersh Maheshwari c5f429521c fix(pricing): add missing Codex 5.3/5.4 and Anthropic model ID entries (#479)
* fix(pricing): add missing Codex 5.3/5.4 and Anthropic model ID entries

Missing pricing entries cause $0.00 cost for:
- GPT 5.3 Codex family (gpt-5.3-codex, -high, -xhigh, -low, -none)
- GPT 5.4 (with hyphen: gpt-5.4)
- GPT 5.1 Codex Mini High
- Common Anthropic model IDs without dates (claude-opus-4-6,
  claude-sonnet-4-6, claude-opus-4, claude-sonnet-4)
- Dated variants used by Claude Code (claude-opus-4-5-20251101,
  claude-sonnet-4-5-20250929)

* refactor: extract shared pricing constants to reduce duplication

Address review feedback: extract duplicated pricing objects into
named constants (GPT_5_3_CODEX_PRICING, CLAUDE_OPUS_4_PRICING, etc.)
and add clarifying comment about intentional hyphen/dot variant entries.
2026-03-19 11:04:30 -03:00
diegosouzapw 426d8636bc fix(stream): extract usage from remaining buffer in flush handler (#480) 2026-03-19 11:02:13 -03:00
diegosouzapw a265c7096e feat(release): v2.8.1 — streaming log fix, Kiro compat, cache tokens, Chinese i18n, configurable tool call ID
Build Electron Desktop App / Validate version (push) Failing after 31s
Build Electron Desktop App / Build Electron (macos-arm64) (push) Has been skipped
Build Electron Desktop App / Build Electron (linux) (push) Has been skipped
Build Electron Desktop App / Build Electron (macos-intel) (push) Has been skipped
Build Electron Desktop App / Build Electron (windows) (push) Has been skipped
Build Electron Desktop App / Create Release (push) Has been skipped
2026-03-19 08:45:54 -03:00
diegosouzapw 1c9953b1ba chore: remove ZWS_README_V1.md (internal contributor doc) 2026-03-19 08:43:17 -03:00
diegosouzapw 601cc21a44 feat: call log response content, per-model tool call ID, key PATCH & validation (#470) 2026-03-19 08:41:01 -03:00
Ethan Hunt 102c42dfe4 feat: Improve the Chinese translation (#475)
Co-authored-by: gmw <rorschach1167@qq.com>
2026-03-19 08:37:51 -03:00
Prakersh Maheshwari 4953727aa7 fix(callLogs): support Claude format usage and include cache tokens (#476)
saveCallLog only read prompt_tokens/completion_tokens (OpenAI format).
When sourceFormat=claude, the openai-to-claude translator writes
input_tokens/output_tokens instead, causing all cross-format requests
(Codex-via-Claude, Kiro-via-Claude, etc.) to show 0|0 tokens in
call_logs.

Also includes cache_read and cache_creation tokens in tokens_in total
so heavily-cached requests don't show misleadingly low input counts.

Changes:
- Read prompt_tokens || input_tokens (supports both formats)
- Read completion_tokens || output_tokens (supports both formats)
- Sum cache_read_input_tokens + cache_creation_input_tokens into total
2026-03-19 08:37:49 -03:00
Prakersh Maheshwari e6af874b47 fix(usage): include cache tokens in usage history input total (#477)
logUsage stored only non-cached input tokens in usage_history.tokens_input.
For heavily-cached Claude requests (common with Claude Code), this shows
near-zero input when the real total is 150K+, causing the analytics
dashboard to severely underreport input token usage.

Now sums: input = prompt_tokens + cache_read + cache_creation
2026-03-19 08:37:46 -03:00
Prakersh Maheshwari 801b4eef4c fix(kiro): strip injected model field from request body (#478)
chatCore.ts injects translatedBody.model for all providers after
translation. Kiro API (AWS CodeWhisperer) has strict schema validation
and rejects unknown top-level fields — only conversationState, profileArn,
and inferenceConfig are valid. This causes 100% of Kiro requests to fail
with "Improperly formed request".

Strip the injected model field in KiroExecutor.transformRequest().
2026-03-19 08:37:44 -03:00
diegosouzapw fe5c20a04e feat(release): v2.8.0 — Bailian Coding Plan, editable provider URLs, 812 tests
Build Electron Desktop App / Validate version (push) Failing after 34s
Build Electron Desktop App / Build Electron (macos-arm64) (push) Has been skipped
Build Electron Desktop App / Build Electron (linux) (push) Has been skipped
Build Electron Desktop App / Build Electron (macos-intel) (push) Has been skipped
Build Electron Desktop App / Build Electron (windows) (push) Has been skipped
Build Electron Desktop App / Create Release (push) Has been skipped
2026-03-19 02:28:45 -03:00
diegosouzapw 246fd05fae feat(providers): add Bailian Coding Plan provider with editable base URL (#467) 2026-03-19 02:25:29 -03:00
diegosouzapw a09b298127 feat(release): v2.7.10 — Alibaba Cloud Coding, Kimi Coding API-key, Docker pino fix
Build Electron Desktop App / Validate version (push) Failing after 34s
Build Electron Desktop App / Build Electron (macos-arm64) (push) Has been skipped
Build Electron Desktop App / Build Electron (linux) (push) Has been skipped
Build Electron Desktop App / Build Electron (macos-intel) (push) Has been skipped
Build Electron Desktop App / Build Electron (windows) (push) Has been skipped
Build Electron Desktop App / Create Release (push) Has been skipped
2026-03-19 01:50:00 -03:00
Jefferson Nunn f89f40778f feat: add API-key Kimi Coding provider path (#463)
* feat: add api-key Kimi Coding provider support

* fix(kimi-coding): honor apikey auth header in executor

Ensure DefaultExecutor sends x-api-key for kimi-coding-apikey at runtime
and deduplicate shared kimi coding config blocks in registry and models
config to reduce drift between oauth and apikey variants.

---------

Co-authored-by: OmniRoute Agent <agent@omniroute.local>
2026-03-19 01:48:26 -03:00
dtk 3d0c8d8d45 feat: add alibaba cloud coding plan provider support (#465)
Co-authored-by: dtk <git@derzsi.cloud>
2026-03-19 01:48:23 -03:00
diegosouzapw 0e5e8bf14e fix(docker): add missing split2 dependency to container image (#459) 2026-03-19 01:46:26 -03:00
diegosouzapw ce34d329d3 chore(release): v2.7.9
Build Electron Desktop App / Validate version (push) Failing after 28s
Build Electron Desktop App / Build Electron (macos-arm64) (push) Has been skipped
Build Electron Desktop App / Build Electron (linux) (push) Has been skipped
Build Electron Desktop App / Build Electron (macos-intel) (push) Has been skipped
Build Electron Desktop App / Build Electron (windows) (push) Has been skipped
Build Electron Desktop App / Create Release (push) Has been skipped
2026-03-18 17:19:42 -03:00
diegosouzapw eaf4a5805c "fix: resolved UI combo setting schema strip (#458)"
"fix: safe crypto fallback for MITM on windows (#456)"
2026-03-18 17:18:31 -03:00
Sergey Morozov 8420e565d4 feat: add responses subpath passthrough for codex (#457) 2026-03-18 17:18:29 -03:00
diegosouzapw 1b68deb0f6 feat(release): v2.7.8 — budget save fix + combo agent UI + omniModel tag strip
Build Electron Desktop App / Validate version (push) Failing after 32s
Build Electron Desktop App / Build Electron (macos-arm64) (push) Has been skipped
Build Electron Desktop App / Build Electron (linux) (push) Has been skipped
Build Electron Desktop App / Build Electron (macos-intel) (push) Has been skipped
Build Electron Desktop App / Build Electron (windows) (push) Has been skipped
Build Electron Desktop App / Create Release (push) Has been skipped
- fix(budget): warningThreshold sent as fraction 0-1 not percentage 0-100 (#451)
- feat(combos): Agent Features UI in combo modal (system_message, tool_filter_regex,
  context_cache_protection) — previously server-only (#454)
- fix(combos): strip <omniModel> tags before forwarding to provider (#454)
2026-03-18 15:38:04 -03:00
Diego Rodrigues de Sa e Souza d1497c9ac8 Merge pull request #455 from diegosouzapw/fix/issue-451-454-budget-combo-ui
fix: budget warningThreshold + combo agent UI fields + omniModel tag strip
2026-03-18 15:37:17 -03:00
diegosouzapw 03d4cbf6d5 fix: budget warningThreshold fraction mismatch + combo agent UI fields + omniModel tag strip
- fix(budget): BudgetTab sent integer percentage (80) but schema validated
  fraction (0-1). Now divides by 100 on POST and multiplies by 100 on GET (#451)

- fix(combos): expose Agent Features UI in combo create/edit modal — fields for
  system_message override, tool_filter_regex, and context_cache_protection were
  implemented server-side (#399/#401) but missing from the dashboard UI (#454)

- fix(combos): strip <omniModel> tags from messages before forwarding to provider.
  The internal cache-pinning tag was being sent to the provider, causing cache
  misses as providers treated each tagged request as a new session (#454)
2026-03-18 15:32:47 -03:00
diegosouzapw 718be831af feat(release): v2.7.7 — Docker pino crash fix + Codex responses worker fix
Build Electron Desktop App / Validate version (push) Failing after 35s
Build Electron Desktop App / Build Electron (macos-arm64) (push) Has been skipped
Build Electron Desktop App / Build Electron (linux) (push) Has been skipped
Build Electron Desktop App / Build Electron (macos-intel) (push) Has been skipped
Build Electron Desktop App / Build Electron (windows) (push) Has been skipped
Build Electron Desktop App / Create Release (push) Has been skipped
- fix(docker): copy pino-abstract-transport + pino-pretty in standalone (#449)
- fix(responses): remove initTranslators() from /v1/responses route (#450)
- chore(deps): commit package-lock.json with each version bump
2026-03-18 15:13:26 -03:00
Diego Rodrigues de Sa e Souza 9d5ec523be Merge pull request #453 from diegosouzapw/fix/issue-449-450-pino-docker-responses-worker
fix: pino Docker crash + Codex /v1/responses worker exit + package-lock sync
2026-03-18 15:11:38 -03:00
diegosouzapw 81c43b45fb fix: pino-abstract-transport missing in Docker + responses worker crash + lock sync
- fix(docker): copy pino-abstract-transport and pino-pretty explicitly in
  runner-base stage — Next.js standalone trace omits them, causing
  'Cannot find module pino-abstract-transport' crash on startup (#449)

- fix(responses): remove initTranslators() call from /v1/responses route —
  bootstrapping translator registry from a Next.js Route Handler worker
  caused 'the worker has exited' uncaughtException on Codex CLI requests.
  Translators are already bootstrapped server-side via open-sse (#450)

- chore: include package-lock.json in commit (was being left behind on
  version bumps, causing npm ci to install inconsistent deps in Docker)
2026-03-18 15:08:57 -03:00
diegosouzapw 146a491769 feat(release): v2.7.5 — login UX + Windows CLI healthcheck
Build Electron Desktop App / Validate version (push) Failing after 34s
Build Electron Desktop App / Build Electron (macos-arm64) (push) Has been skipped
Build Electron Desktop App / Build Electron (linux) (push) Has been skipped
Build Electron Desktop App / Build Electron (macos-intel) (push) Has been skipped
Build Electron Desktop App / Build Electron (windows) (push) Has been skipped
Build Electron Desktop App / Create Release (push) Has been skipped
- fix(ux): show default password hint on login page (#437)
- fix(cli): spawn shell:true on Windows for .cmd CLI resolution (#447)
2026-03-18 14:52:05 -03:00
Diego Rodrigues de Sa e Souza 4c53388579 Merge pull request #448 from diegosouzapw/fix/issue-437-447-435-login-healthcheck-gemini
fix: login default password hint + Windows CLI healthcheck shell resolution
2026-03-18 14:51:19 -03:00
diegosouzapw 3403ddcc6e fix: login password hint + Windows CLI healthcheck + i18n key
- fix(ux): add default password hint on login page for first-time users (#437)
  The fallback password (123456) is now shown as a hint below the
  password input so users don't get locked out during initial setup.

- fix(cli): add shell:true to spawn on Windows so .cmd wrappers are
  resolved correctly via PATHEXT (#447). Claude, opencode, and other
  npm-installed CLIs show as 'not runnable' on Windows even when
  installed because spawn() cannot find .cmd files without shell:true.

- i18n: add defaultPasswordHint key to en.json auth namespace
2026-03-18 14:44:49 -03:00
diegosouzapw 684b81d835 feat(release): v2.7.4 — search playground, i18n fixes, Copilot limits, Serper validation
Build Electron Desktop App / Validate version (push) Failing after 34s
Build Electron Desktop App / Build Electron (macos-arm64) (push) Has been skipped
Build Electron Desktop App / Build Electron (linux) (push) Has been skipped
Build Electron Desktop App / Build Electron (macos-intel) (push) Has been skipped
Build Electron Desktop App / Build Electron (windows) (push) Has been skipped
Build Electron Desktop App / Create Release (push) Has been skipped
- feat(search): search playground + search tools page + local rerank (#443 @Regis-RCR)
- fix(analytics): localize day/date labels with Intl.DateTimeFormat (#444 @hijak)
- fix(copilot): correct account type display, filter unlimited quotas (#445 @hijak)
- fix(providers): stop rejecting valid Serper API keys on non-4xx (#446 @hijak)
2026-03-18 12:11:00 -03:00
Diego Rodrigues de Sa e Souza 4f32da57fd Merge pull request #443 from Regis-RCR/feat/search-playground
feat(search): add search playground, search tools, and local rerank routing
2026-03-18 12:09:51 -03:00
Diego Rodrigues de Sa e Souza 97265e48b3 Merge pull request #444 from hijak/fix/analytics-day-date-translations
fix: localize analytics day and date labels
2026-03-18 12:07:03 -03:00
Diego Rodrigues de Sa e Souza 64797158e2 Merge pull request #445 from hijak/fix/copilot-account-type-limits
fix: correct GitHub Copilot account type and limits
2026-03-18 12:06:59 -03:00
Diego Rodrigues de Sa e Souza 8359293dcd Merge pull request #446 from hijak/fix/serper-api-key-validation
fix: stop rejecting valid Serper API keys
2026-03-18 12:06:36 -03:00
Jack Cowey b2dc53d18b fix(search): return consistent validation result shape
Keep search provider validation responses consistent with other validators so Serper regression tests and CI assertions can rely on unsupported=false.

Made-with: Cursor
2026-03-18 12:55:25 +00:00
Jack Cowey edf8dd2a12 fix(search): accept authenticated serper validation responses
Treat non-auth Serper validation errors as successful authentication so valid API keys are not rejected during provider setup.

Made-with: Cursor
2026-03-18 12:29:14 +00:00
Jack Cowey 5a777bd598 fix(github): correct copilot plan and quota mapping
Normalize GitHub Copilot account tiers from the usage payload and hide misleading unlimited buckets so account type and limits render correctly in the dashboard.

Made-with: Cursor
2026-03-18 12:25:17 +00:00
Jack Cowey bd39e01ee1 fix(analytics): localize most active day and weekly labels
Use the active app locale for analytics weekday and date formatting so the dashboard no longer shows hardcoded Portuguese labels.

Made-with: Cursor
2026-03-18 12:17:56 +00:00
Regis e3ed29aab6 feat(search): add search playground, search tools, and local rerank routing
Search Playground (Phase 1):
- Web Search as 10th endpoint in Playground with isolated SearchPlayground component
- Endpoint selector moved first; Provider/Model/Send hidden when search selected
- Provider dropdown via GET /api/search/providers, formatted results with cache indicator

Search Tools page (Phase 2) at /dashboard/search-tools:
- Split panel: SearchForm (left) with query, provider, filters + ResultsPanel (right)
- Compare Providers: parallel queries with latency, cost, response size, URL overlap
- Rerank Pipeline: model selector from /v1/models, results with position delta
- Search History: last 10 searches from call_logs with replay
- Sidebar entry under Debug section

Backend:
- GET /api/search/providers — list providers with auth guard + SEARCH_CREDENTIAL_FALLBACKS
- GET /api/search/stats — cache stats, provider aggregates, recent searches (auth guard)
- Add local provider_nodes routing for /v1/rerank (oMLX, vLLM support)

Bug fixes (from F-27 PR #432):
- Fix Brave news normalizer: data.results directly, not data.news.results
- Enforce max_results truncation after normalization for all providers
- Fix EndpointPageClient: use /api/search/providers instead of /api/v1/search
- Add isAuthenticated() guards on /api/search/providers and /api/search/stats

Response size metric in results meta bar and compare table.
i18n: 30+ keys in search namespace (en.json)
2026-03-18 12:43:24 +01:00
diegosouzapw 896ce9c0e2 feat(release): v2.7.3 — fix Codex direct API weekly quota fallback
Build Electron Desktop App / Validate version (push) Failing after 36s
Build Electron Desktop App / Build Electron (macos-arm64) (push) Has been skipped
Build Electron Desktop App / Build Electron (linux) (push) Has been skipped
Build Electron Desktop App / Build Electron (macos-intel) (push) Has been skipped
Build Electron Desktop App / Build Electron (windows) (push) Has been skipped
Build Electron Desktop App / Create Release (push) Has been skipped
- fix(codex): resolveQuotaWindow() prefix-matches 'weekly' → 'weekly (7d)' cache keys
- fix(codex): applyCodexWindowPolicy() enforces useWeekly/use5h toggles in direct API
- 4 new regression tests, 766 total passing
- Closes #440
2026-03-18 08:41:13 -03:00
Diego Rodrigues de Sa e Souza 82934132e9 Merge pull request #441 from rexname/fix/issue-440-direct-api-fallback
fix(codex): block weekly-exhausted accounts in direct API fallback
2026-03-18 08:40:19 -03:00
rexname a2012b70de chore(review): harden window normalization and deterministic quota matching 2026-03-18 14:17:37 +07:00
rexname bcfeba8a57 fix(codex): enforce weekly quota blocking for direct API fallback 2026-03-18 13:57:25 +07:00
diegosouzapw d3dfd9ce57 feat(release): v2.7.2 — fix light mode contrast in logs UI
Build Electron Desktop App / Validate version (push) Failing after 38s
Build Electron Desktop App / Build Electron (macos-arm64) (push) Has been skipped
Build Electron Desktop App / Build Electron (linux) (push) Has been skipped
Build Electron Desktop App / Build Electron (macos-intel) (push) Has been skipped
Build Electron Desktop App / Build Electron (windows) (push) Has been skipped
Build Electron Desktop App / Create Release (push) Has been skipped
- fix(logs): text colors in filter buttons + combo badge now have dark: variants
- Bumped version to 2.7.2
- Updated CHANGELOG and openapi.yaml
2026-03-18 00:42:22 -03:00
Diego Rodrigues de Sa e Souza aa06d5d356 Merge pull request #433 from diegosouzapw/fix/issue-378-logs-light-mode-contrast
Merged fix for light mode contrast in filter buttons and combo badge. Thanks @rdself for the great bug report!
2026-03-18 00:41:28 -03:00
diegosouzapw 448c8a29e1 fix(logs): fix light mode contrast in filter buttons and combo badge (#378)
- text-red-400 → text-red-700 dark:text-red-400 (error filter, recording button)
- text-emerald-400 → text-emerald-700 dark:text-emerald-400 (ok filter)
- text-violet-300 → text-violet-700 dark:text-violet-300 (combo filter)
- combo row badge: violet-700 → violet-800 dark:violet-300, stronger border

Fixes #378
2026-03-17 16:46:27 -03:00
diegosouzapw 928b7120f4 feat(release): v2.7.1 — unified web search routing + Next.js 16.1.7 security
Build Electron Desktop App / Validate version (push) Failing after 35s
Build Electron Desktop App / Build Electron (macos-arm64) (push) Has been skipped
Build Electron Desktop App / Build Electron (linux) (push) Has been skipped
Build Electron Desktop App / Build Electron (macos-intel) (push) Has been skipped
Build Electron Desktop App / Build Electron (windows) (push) Has been skipped
Build Electron Desktop App / Create Release (push) Has been skipped
- POST /v1/search: 5 providers (Serper, Brave, Perplexity, Exa, Tavily), 6,500+ free/mo
- Search analytics dashboard tab + GET /api/v1/search/analytics
- db: request_type column on call_logs (migration 007)
- Next.js 16.1.7: 6 CVEs fixed (critical: CVE-2026-29057 HTTP request smuggling)
- docs/openapi.yaml: bumped to 2.7.1
2026-03-17 16:27:31 -03:00
diegosouzapw a3deacd718 feat: Implement historical model latency and success rate tracking for auto-combo routing and update Claude and Deepseek pricing and model registrations. 2026-03-17 16:18:36 -03:00
diegosouzapw 78959fffbd Merge branch 'main' of https://github.com/diegosouzapw/OmniRoute 2026-03-17 16:18:12 -03:00
Diego Rodrigues de Sa e Souza 1788616e52 Merge pull request #431 from diegosouzapw/dependabot/npm_and_yarn/next-16.1.7
Security update merged: Next.js 16.1.7 fixes 6 CVEs including critical CVE-2026-29057 (HTTP request smuggling). No breaking changes.
2026-03-17 16:18:01 -03:00
Diego Rodrigues de Sa e Souza c61e6d0777 Merge pull request #432 from Regis-RCR/feat/search-provider-routing
Merged with dashboard improvements: SearchAnalyticsTab + /api/v1/search/analytics endpoint — PR review complete by Antigravity.
2026-03-17 16:17:39 -03:00
diegosouzapw a3bc7620b1 feat(integration): integrate ClawRouter services into active pipeline
- intentClassifier → engine.ts selectProvider()
  When taskType is 'default', classifies prompt via multilingual keyword
  detection (9 langs) and uses detected intent (code/reasoning/simple/medium)
  for 6-factor task fitness scoring.

- emergencyFallback → chatCore.ts error path (after T5 intra-family fallback)
  On HTTP 402 or budget-exhaustion keywords, attempts one redirect to
  nvidia/gpt-oss-120b ($0.00/M) before returning error to combo router.
  Skipped for streaming requests and tool-calling requests.

- AutoComboConfig.routerStrategy field added
  Allows per-combo strategy override ('rules' | 'cost' | 'latency')

Note: requestDedup was already integrated in chatCore.ts (line 387-430)
Branch: feat/clawrouter-improvements
2026-03-17 15:22:12 -03:00
diegosouzapw 8064c588dc docs(i18n): sync v2.7.0 release notes to 29 language READMEs
New in v2.7.0: pluggable RouterStrategy, multilingual intent detection,
request deduplication, new providers (Grok-4 Fast, GLM-5/Z.AI,
MiniMax M2.5, Kimi K2.5). Native translations for de/es/fr/it/ru/zh-CN/ja/ko/ar/pt-BR/pt.
2026-03-17 15:11:09 -03:00
Regis 564e983c68 feat(search): add unified web search routing with 5 providers
Add POST /v1/search — a unified search endpoint routing queries across
5 providers (Serper, Brave, Perplexity Search, Exa, Tavily) with
automatic failover, in-memory caching, and request coalescing.

No open-source AI gateway offers unified search routing. This chains
free tiers for 5,500+ searches/month with zero downtime.

Providers: Serper ($0.001/q, 2500/mo free), Brave ($0.005/q, 1000/mo),
Perplexity Search ($0.005/q), Exa ($0.007/q, 1000/mo), Tavily
($0.008/q, 1000/mo). Auto-select picks cheapest with credentials.

Architecture follows existing patterns:
- searchRegistry.ts (same as embeddingRegistry.ts)
- search.ts handler (same as embeddings.ts)
- route.ts (same as /v1/embeddings/route.ts)
- searchCache.ts (bounded TTL cache + request coalescing)

Schema finalized — all future fields defined as optional with safe
defaults. No breaking changes when implementing content extraction,
answer synthesis, or ranking.

Key features:
- Per-provider request builders and response normalizers
- Enriched response: display_url, score, favicon_url, content block,
  metadata, answer block, errors array, upstream_latency_ms metrics
- Cost-sorted auto-select with failover on 429/5xx/timeout
- Credential fallback (perplexity-search reuses perplexity chat key)
- Cache key includes all result-affecting parameters
- max_results clamped to provider limits, sanitized error responses
- Factored validators (validateSearchProvider factory)
- CORS headers on all responses
- Dashboard: Search & Discovery section, search provider template
- DB migration 007: request_type column in call_logs
- 28 unit tests (registry, cache, coalescing, validation)
2026-03-17 18:28:35 +01:00
diegosouzapw e1da181740 fix(publish): also remove app/electron/ (contains app.asar binary) to prevent Z_DATA_ERROR 2026-03-17 14:25:48 -03:00
diegosouzapw c63209200e fix(publish): remove app/vscode-extension/ after build to prevent Z_DATA_ERROR in npm pack 2026-03-17 14:13:15 -03:00
diegosouzapw 737808cf53 fix(npm): exclude app/vscode-extension/ from package to prevent Z_DATA_ERROR during publish 2026-03-17 13:50:06 -03:00
diegosouzapw a197bb7736 fix(routerStrategy): use .ts extension in imports for Next.js App Router bundle compatibility 2026-03-17 13:15:47 -03:00
dependabot[bot] f9dd967bc5 deps: bump next from 16.1.6 to 16.1.7
Bumps [next](https://github.com/vercel/next.js) from 16.1.6 to 16.1.7.
- [Release notes](https://github.com/vercel/next.js/releases)
- [Changelog](https://github.com/vercel/next.js/blob/canary/release.js)
- [Commits](https://github.com/vercel/next.js/compare/v16.1.6...v16.1.7)

---
updated-dependencies:
- dependency-name: next
  dependency-version: 16.1.7
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-03-17 16:14:44 +00:00
diegosouzapw 44e4d55a66 feat(release): merge feat/clawrouter-improvements — v2.7.0
Build Electron Desktop App / Validate version (push) Failing after 40s
Build Electron Desktop App / Build Electron (macos-arm64) (push) Has been skipped
Build Electron Desktop App / Build Electron (linux) (push) Has been skipped
Build Electron Desktop App / Build Electron (macos-intel) (push) Has been skipped
Build Electron Desktop App / Build Electron (windows) (push) Has been skipped
Build Electron Desktop App / Create Release (push) Has been skipped
2026-03-17 13:12:41 -03:00
diegosouzapw 095c84ac16 fix(providerRegistry): remove duplicate claude-haiku-4-5-20251001 from anthropic provider to prevent ambiguous model resolution 2026-03-17 13:10:23 -03:00
diegosouzapw e063eae727 feat(clawrouter): implement 14 ClawRouter-inspired features
PRICING UPDATES (01-09):
- xAI Grok-4 family: grok-4-fast-non-reasoning (/usr/bin/bash.20/$0.50/M, 1143ms),
  grok-4-fast-reasoning, grok-4-1-fast-*, grok-4-0709, grok-3, grok-3-mini
- Z.AI GLM-5 family: glm-5 + glm-5-turbo (128k maxOutput, $1.00/$3.20/M)
- Gemini Flash Lite: price corrected $0.15→$0.10 / $1.25→$0.40 (per ClawRouter)
- Gemini 3.1 Pro: new flagship (1.05M context, aliased as gemini-3.1-pro)
- Anthropic Claude 4.5/4.6: haiku-4.5 ($1/$5), sonnet-4.6 ($3/$15), opus-4.6 ($5/$25)
- DeepSeek native section: deepseek-chat/v3/v3.2 ($0.28/$0.42), deepseek-reasoner ($0.55/$2.19)
- Kimi K2.5 Moonshot: kimi-k2.5 ($0.60/$3.00, 262k ctx), moonshot-kimi-k2.5 alias
- MiniMax M2.5: minimax-m2.5 ($0.30/$1.20, 204k ctx, reasoning+tools)
- NVIDIA free tier: gpt-oss-120b at $0.00/M via emergencyFallback.ts

INFRASTRUCTURE FEATURES (10-14):
- feat(router): add intentClassifier.ts for multilingual intent detection (9 langs)
  Detects code/reasoning/simple in EN, PT-BR, ES, ZH, JA, RU, DE, KO, AR
- feat(dedup): add requestDedup.ts for concurrent request deduplication
  SHA-256 hash, skip streaming, skip high-temperature, 60s failsafe TTL
- feat(autoCombo): add routerStrategy.ts pluggable strategy system
  RouterStrategy interface, RulesStrategy (6-factor) + CostStrategy, registry
- feat(fallback): add emergencyFallback.ts budget-exhaustion detector
  Triggers on HTTP 402 or budget keywords, redirects to nvidia/gpt-oss-120b
- feat(taskFitness): add fitness scores for Grok-4, Kimi K2.5, GLM-5,
  MiniMax M2.5, DeepSeek V3.2, Gemini 3.1 Pro across all task categories

PROVIDERS:
- providers.ts: add Z.AI (zai) provider entry for GLM-5 API key connections

All features on branch: feat/clawrouter-improvements
Source: github.com/BlockRunAI/ClawRouter analysis (2026-03-17)
2026-03-17 10:43:12 -03:00
diegosouzapw f02c5b5c69 fix(install/v2.6.10): Windows better-sqlite3 prebuilt download (#426)
Build Electron Desktop App / Validate version (push) Failing after 35s
Build Electron Desktop App / Build Electron (macos-arm64) (push) Has been skipped
Build Electron Desktop App / Build Electron (linux) (push) Has been skipped
Build Electron Desktop App / Build Electron (macos-intel) (push) Has been skipped
Build Electron Desktop App / Build Electron (windows) (push) Has been skipped
Build Electron Desktop App / Create Release (push) Has been skipped
npm version patch run BEFORE staging files — this is an ATOMIC commit.

Adds Strategy 1.5 to scripts/postinstall.mjs:
- Uses @mapbox/node-pre-gyp install --fallback-to-build=false
  (bundled within better-sqlite3) to download the correct prebuilt
  binary for the current OS/arch (win32-x64/arm64, darwin-x64/arm64)
  WITHOUT requiring node-gyp, Python, or MSVC build tools.
- Tries node-pre-gyp.cmd (Windows) or node-pre-gyp (Unix) from .bin/
  with fallback to direct path in @mapbox/node-pre-gyp/bin/
- Falls back to npm rebuild only if prebuilt download fails.
- Windows-specific error: shows Option A (npx node-pre-gyp) and
  Option B (rebuild) with Visual Studio Build Tools links.

Fixes: #426 (better_sqlite3.node is not a valid Win32 application)
2026-03-17 10:09:45 -03:00
diegosouzapw 838f1d645c fix(v2.6.9): CI budget checks, #409 file attachments, atomic release workflow
Build Electron Desktop App / Validate version (push) Failing after 38s
Build Electron Desktop App / Build Electron (macos-arm64) (push) Has been skipped
Build Electron Desktop App / Build Electron (linux) (push) Has been skipped
Build Electron Desktop App / Build Electron (macos-intel) (push) Has been skipped
Build Electron Desktop App / Build Electron (windows) (push) Has been skipped
Build Electron Desktop App / Create Release (push) Has been skipped
Includes version bump — v2.6.9 — committed ATOMICALLY with all changes:

fixes:
- fix(ci/t11): Remove 'any' from comments in openai-responses.ts + chatCore.ts
  (\bany\b regex counted comment text as explicit any violations)
- fix(chatCore/#409): Normalize unsupported content part types before forwarding
  Cursor sends {type:'file'} for .md attachments; Copilot/OpenAI providers reject
  with 'type has to be either image_url or text'. Now: file/document→text block,
  unknown types dropped with debug log. Fixes claude-* models via github-copilot.

workflow:
- chore(generate-release): ATOMIC COMMIT RULE — npm version patch MUST run before
  feature commits so the release tag always points to a commit with full changes
2026-03-17 09:09:01 -03:00
diegosouzapw ce2c30c437 chore(release): v2.6.8 — combo agents, auto-update, detailed logs, MITM Kiro
Build Electron Desktop App / Validate version (push) Failing after 31s
Build Electron Desktop App / Build Electron (macos-arm64) (push) Has been skipped
Build Electron Desktop App / Build Electron (linux) (push) Has been skipped
Build Electron Desktop App / Build Electron (macos-intel) (push) Has been skipped
Build Electron Desktop App / Build Electron (windows) (push) Has been skipped
Build Electron Desktop App / Create Release (push) Has been skipped
2026-03-17 08:58:03 -03:00
diegosouzapw d56fae0a7b feat: combo agents, auto-update UI, detailed logs, MITM Kiro (#399 #401 #320 #378 #336)
DB Migrations (zero-breaking, ADD COLUMN DEFAULT NULL + new table):
- 005_combo_agent_fields.sql: system_message, tool_filter_regex, context_cache_protection on combos
- 006_detailed_request_logs.sql: ring-buffer table (500 entries) for full pipeline body capture

Features:
- #399 System Message Override + Tool Filter Regex per Combo
  - applyComboAgentMiddleware() injected into handleComboChat/handleRoundRobinCombo
  - Supports both OpenAI and Anthropic tool name formats
- #401 Context Caching Protection (Stateless)
  - injectModelTag() appends <omniModel>provider/model</omniModel> to responses
  - extractPinnedModel() reads tag from history and pins model for session
- #320 Auto-Update via Settings
  - GET /api/system/version — current vs latest npm
  - POST /api/system/update — fire-and-forget npm install + pm2 restart
- #378 Detailed Request Logs
  - saveRequestDetailLog() captures bodies at 4 pipeline stages (opt-in toggle)
  - GET/POST /api/logs/detail — list logs + enable/disable toggle
- #336 MITM Kiro IDE
  - src/mitm/targets/kiro.ts: MitmTarget profile for api.anthropic.com interception
2026-03-17 08:53:41 -03:00
diegosouzapw e45ef00bef chore(release): v2.6.7 — SSE fixes, local provider_nodes, proxy registry
Build Electron Desktop App / Validate version (push) Failing after 32s
Build Electron Desktop App / Build Electron (macos-arm64) (push) Has been skipped
Build Electron Desktop App / Build Electron (linux) (push) Has been skipped
Build Electron Desktop App / Build Electron (macos-intel) (push) Has been skipped
Build Electron Desktop App / Build Electron (windows) (push) Has been skipped
Build Electron Desktop App / Create Release (push) Has been skipped
PRs merged: #414 (deps) #415 #417 #419 #420 #421 (SSE fixes)
            #418 (Claude passthrough) #422 #416 #423 (local nodes)
            #427 (strip empty blocks) #428 (OAuth refreshable)
            #429 (proxy registry)
Contributors: @prakersh, @Regis-RCR, @dependabot
2026-03-17 08:17:11 -03:00
diegosouzapw e9f31f7394 Merge pull request #429 from contributor branch 2026-03-17 08:14:05 -03:00
diegosouzapw 7c10a98eb2 Merge pull request #428 from contributor branch 2026-03-17 08:14:04 -03:00
diegosouzapw f260483101 Merge pull request #427 from contributor branch 2026-03-17 08:14:03 -03:00
diegosouzapw 389e6e5c9e Merge pull request #423 from contributor branch 2026-03-17 08:14:02 -03:00
diegosouzapw 1cfd5866be Merge pull request #422 from contributor branch 2026-03-17 08:14:02 -03:00
diegosouzapw c7ceac7f41 Merge pull request #421 from contributor branch 2026-03-17 08:14:01 -03:00
diegosouzapw cd6eca0424 Merge pull request #420 from contributor branch 2026-03-17 08:14:00 -03:00
diegosouzapw 8c6136fea0 fix(sse): generate fallback call_id for tool calls with missing IDs (#419)
Co-authored-by: Prakersh Maheshwari <prakersh@users.noreply.github.com>
2026-03-17 08:11:53 -03:00
Diego Rodrigues de Sa e Souza 9644444028 Merge pull request #418 from prakersh/fix/claude-to-claude-passthrough
fix(sse): add Claude-to-Claude passthrough for anthropic-compatible providers
2026-03-17 08:09:44 -03:00
Diego Rodrigues de Sa e Souza 9c4154291d Merge pull request #417 from prakersh/fix/orphaned-tool-result-filter
fix(sse): filter orphaned tool results after context compaction
2026-03-17 08:09:41 -03:00
Diego Rodrigues de Sa e Souza 533f5f6da6 Merge pull request #416 from Regis-RCR/feat/audio-provider-nodes
feat(audio): route audio requests to local provider_nodes
2026-03-17 08:09:38 -03:00
Diego Rodrigues de Sa e Souza 1b8de756cd Merge pull request #415 from prakersh/fix/empty-tool-name-loop
fix(sse): skip empty-name tool calls in Responses API translator
2026-03-17 08:09:28 -03:00
Diego Rodrigues de Sa e Souza 650b415537 Merge pull request #414 from diegosouzapw/dependabot/npm_and_yarn/development-cc00f57801
deps: bump the development group with 4 updates
2026-03-17 08:09:25 -03:00
rexname 04b50329fc fix(proxy): address PR review findings for auth, credentials, and health stats 2026-03-17 16:58:44 +07:00
Regis 25aab8c55c feat(audio): route audio requests to local provider_nodes
Audio endpoints (/v1/audio/speech and /v1/audio/transcriptions) only
supported hardcoded providers from audioRegistry.ts. Local inference
backends configured as provider_nodes (e.g., MLX-Audio, oMLX) could
not serve audio through OmniRoute.

This adds a Phase 3 fallback in the audio model parser that consults
provider_nodes from the database. Local providers with api_type=openai
are automatically available for audio routing via their prefix
(e.g., mlx-audio/tts-model, omlx/whisper-large-v3-turbo).

Design: injection pattern — Next.js route handlers load provider_nodes
(async DB query) and pass them to the sync parser as a parameter.
No cross-workspace imports, no breaking changes to existing parsers.

Changes:
- Add buildDynamicAudioProvider() in audioRegistry.ts
- Add Phase 3 (provider_nodes prefix match) to parseAudioModel()
- Extend parseSpeechModel/parseTranscriptionModel with optional
  dynamicProviders parameter (backward compatible)
- Load and inject provider_nodes in speech/transcription route handlers
- Dynamic providers use authType=none (local, no credentials needed)
2026-03-17 09:24:18 +01:00
Oleg Saprykin ceda2e70c1 fix(api): add refreshable: true to claude OAuth test config
Claude OAuth tokens are short-lived and require refresh. The runtime
HealthCheck (open-sse) already refreshes them successfully, but the
Dashboard test endpoint was missing `refreshable: true` in its config.

This caused the Dashboard to show "auth failed / Token expired" for
Claude providers even though the tokens were being refreshed correctly
at runtime. The codex provider already had this flag set.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 10:47:35 +03:00
Oleg Saprykin 2908303d4b fix(sse): strip empty text content blocks before translation
Anthropic API rejects requests containing {"type":"text","text":""} with
400 "text content blocks must be non-empty". Some clients like LiteLLM
passthrough and @ai-sdk/anthropic may forward empty text blocks as-is.

Filter out empty text content blocks from messages before calling
translateRequest, similar to how empty-name tools are already stripped.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 10:46:24 +03:00
rexname 8091b6b508 feat: implement proxy registry, management APIs, docs, and test hardening 2026-03-17 13:05:27 +07:00
Regis 0aede2ef63 feat(health): background health check for local provider_nodes
Local inference backends (oMLX, Ollama, LM Studio) configured as
provider_nodes have no health monitoring. When a local provider is
down, OmniRoute waits the full timeout before failing.

This adds a background health check that polls local provider_nodes:
- GET /models with 5s timeout for each local node (localhost only)
- In-memory health cache (no DB migration needed)
- Promise.allSettled for parallel checks (one slow node doesn't block)
- Exponential backoff on failures: 30s → 60s → 120s → 300s max
- Reset to 30s on first success after failure
- State transition logging (healthy ↔ unhealthy)
- Expose health status via GET /api/monitoring/health (localProviders)
- Auto-init on first import (same pattern as tokenHealthCheck)
- 401 treated as healthy (server up, auth required)
- isNodeHealthy() returns true if never checked (optimistic default)
2026-03-16 22:44:43 +01:00
Regis 1e3a2e0a27 feat(embeddings): route embedding requests to local provider_nodes
Embedding endpoint (/v1/embeddings) only supports 6 hardcoded cloud
providers. Local inference backends (oMLX, Ollama) serving embeddings
via provider_nodes are inaccessible through OmniRoute.

This adds dynamic provider_node support for embeddings:
- Add EmbeddingProvider interface and buildDynamicEmbeddingProvider()
- Add Phase 2 (provider_nodes prefix match) in parseEmbeddingModel()
- Handler accepts resolvedProvider/resolvedModel from route (injection pattern)
- Handler supports authType=none for local providers (was missing — critical gap)
- Route loads local provider_nodes (localhost only — prevents auth bypass/SSRF)
- Route filters by apiType=chat|responses and localhost hostname
- buildDynamicEmbeddingProvider validates inputs (prefix + baseUrl required)
- Per-node try/catch in map — one bad row doesn't block all providers
- DB errors logged and fall back to hardcoded providers
2026-03-16 22:15:49 +01:00
Prakersh Maheshwari 1bdabf43db fix: prevent mutation of original request body in Claude passthrough
Use shallow copy ({ ...body }) instead of direct reference assignment
so that later translatedBody.model = model does not mutate the
caller's original body object.
2026-03-17 02:45:21 +05:30
Prakersh Maheshwari 05e568feb0 fix(sse): extract Claude SSE usage in passthrough stream mode 2026-03-17 02:41:54 +05:30
Prakersh Maheshwari 81e2519436 refactor: replace as any casts with explicit inline types
Addresses PR review: use `{ id?: string }[]` and
`{ type?: string; call_id?: string }` instead of `any`.
2026-03-17 02:40:36 +05:30
Prakersh Maheshwari ef623c9bb5 refactor: trim function name consistently in Responses-to-Chat direction
Addresses PR review: both translation directions now trim the function
name the same way, matching the Chat-to-Responses pattern.
2026-03-17 02:35:42 +05:30
Prakersh Maheshwari da581525a6 fix(sse): strip Claude-specific fields in OpenAI format cleanup 2026-03-17 02:16:26 +05:30
Prakersh Maheshwari 6ff7b6570c fix(sse): add Claude-to-Claude passthrough for anthropic-compatible providers
When both source and target formats are Claude, skip all request
modification and forward the body untouched. This prevents
prepareClaudeRequest from corrupting valid Claude-native requests
destined for anthropic-compatible provider nodes.
2026-03-17 02:03:45 +05:30
Prakersh Maheshwari 8b2081837e fix(sse): filter orphaned tool results after context compaction
When Claude Code compacts conversation context to fit within token
limits, it may remove assistant messages containing tool_use/tool_calls
while leaving the corresponding tool_result/function_call_output
messages intact. This creates orphaned tool results that cause
providers to reject requests with errors like "tool result's tool id
not found" or "No tool call found for function call output".
2026-03-17 01:59:40 +05:30
Prakersh Maheshwari ce978b602a fix(sse): skip empty-name tool calls in Responses API translator
Prevents infinite retry loops when models generate tool calls with
empty function names. The normalizeToolName function converted these
to "placeholder_tool" which does not exist in any client's tool
registry, causing repeated error-retry cycles.
2026-03-17 01:47:22 +05:30
dependabot[bot] 9b00f5d550 deps: bump the development group with 4 updates
Bumps the development group with 4 updates: [@types/node](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/node), [lint-staged](https://github.com/lint-staged/lint-staged), [typescript-eslint](https://github.com/typescript-eslint/typescript-eslint/tree/HEAD/packages/typescript-eslint) and [vitest](https://github.com/vitest-dev/vitest/tree/HEAD/packages/vitest).


Updates `@types/node` from 25.4.0 to 25.5.0
- [Release notes](https://github.com/DefinitelyTyped/DefinitelyTyped/releases)
- [Commits](https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/node)

Updates `lint-staged` from 16.3.2 to 16.4.0
- [Release notes](https://github.com/lint-staged/lint-staged/releases)
- [Changelog](https://github.com/lint-staged/lint-staged/blob/main/CHANGELOG.md)
- [Commits](https://github.com/lint-staged/lint-staged/compare/v16.3.2...v16.4.0)

Updates `typescript-eslint` from 8.57.0 to 8.57.1
- [Release notes](https://github.com/typescript-eslint/typescript-eslint/releases)
- [Changelog](https://github.com/typescript-eslint/typescript-eslint/blob/main/packages/typescript-eslint/CHANGELOG.md)
- [Commits](https://github.com/typescript-eslint/typescript-eslint/commits/v8.57.1/packages/typescript-eslint)

Updates `vitest` from 4.0.18 to 4.1.0
- [Release notes](https://github.com/vitest-dev/vitest/releases)
- [Commits](https://github.com/vitest-dev/vitest/commits/v4.1.0/packages/vitest)

---
updated-dependencies:
- dependency-name: "@types/node"
  dependency-version: 25.5.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
  dependency-group: development
- dependency-name: lint-staged
  dependency-version: 16.4.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
  dependency-group: development
- dependency-name: typescript-eslint
  dependency-version: 8.57.1
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: development
- dependency-name: vitest
  dependency-version: 4.1.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
  dependency-group: development
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-03-16 19:04:07 +00:00
196 changed files with 16474 additions and 2232 deletions
+21
View File
@@ -32,6 +32,27 @@ Version format: `2.x.y` — examples:
npm version patch --no-git-tag-version
```
> **⚠️ ATOMIC COMMIT RULE — Version bump MUST happen before committing feature files.**
>
> **CORRECT order:**
>
> 1. `npm version patch --no-git-tag-version` ← bump first
> 2. implement features / fix bugs
> 3. `git add -A && git commit -m "chore(release): v2.x.y — all changes in ONE commit"`
>
> **OR if features are already staged:**
>
> 1. implement features (do NOT commit yet)
> 2. `npm version patch --no-git-tag-version` ← bump before committing
> 3. `git add -A && git commit -m "chore(release): v2.x.y — all changes in ONE commit"`
>
> **NEVER do this (creates version mismatch in git history):**
>
> - ~~commit features → then bump version → commit package.json separately~~
>
> This ensures that `git show v2.x.y` always contains both code changes and the version bump together.
> The GitHub release tag will point to a commit that includes ALL changes for that version.
### 2. Regenerate lock file (REQUIRED after version bump)
**Mandatory** — skipping causes `@swc/helpers` lock mismatch and CI failures:
+2
View File
@@ -55,6 +55,8 @@ logs/*
# analysis directories (generated, not tracked)
.analysis/
antigravity-manager-analysis/
.sisyphus/
.plans/
# docs (allow specific tracked files)
docs/*
+5
View File
@@ -3,6 +3,11 @@ data/
**/data/
**/db.json
# VS Code extension test runtime (large binary, not needed in npm package)
app/vscode-extension/
**/data/
**/db.json
# Source code (pre-built app/ is published instead)
src/
open-sse/
+326
View File
@@ -4,6 +4,332 @@
---
## [2.8.2] — 2026-03-19
> Sprint: 2 merged PRs, model aliases routing fix, log export, and issue triage.
### Features
- **Log Export**: New Export button on `/dashboard/logs` with time range dropdown (1h, 6h, 12h, 24h). Downloads JSON of request/proxy/call logs via `/api/logs/export` API (#user-request)
### Bug Fixes
- **Model Aliases Routing** (#472): Settings → Model Aliases now correctly affect provider routing, not just format detection. Previously `resolveModelAlias()` output was only used for `getModelTargetFormat()` but the original model ID was sent to the provider
- **Stream Flush Usage** (#480): Usage data from the last SSE event in the buffer is now correctly extracted during stream flush (merged from @prakersh)
### Merged PRs
- #480 — Extract usage from remaining buffer in flush handler (@prakersh)
- #479 — Add missing Codex 5.3/5.4 and Anthropic model ID pricing entries (@prakersh)
---
## [2.8.1] — 2026-03-19
> Sprint: Five community PRs — streaming call log fixes, Kiro compatibility, cache token analytics, Chinese translation, and configurable tool call IDs.
### ✨ Features
- **feat(logs)**: Call log response content now correctly accumulated from raw provider chunks (OpenAI/Claude/Gemini) before translation, fixing empty response payloads in streaming mode (#470, @zhangqiang8vip)
- **feat(providers)**: Per-model configurable 9-char tool call ID normalization (Mistral-style) — only models with the option enabled get truncated IDs (#470)
- **feat(api)**: Key PATCH API expanded to support `allowedConnections`, `name`, `autoResolve`, `isActive`, and `accessSchedule` fields (#470)
- **feat(dashboard)**: Response-first layout in request log detail UI (#470)
- **feat(i18n)**: Improved Chinese (zh-CN) translation — complete retranslation (#475, @only4copilot)
### 🐛 Bug Fixes
- **fix(kiro)**: Strip injected `model` field from request body — Kiro API rejects unknown top-level fields (#478, @prakersh)
- **fix(usage)**: Include cache read + cache creation tokens in usage history input totals for accurate analytics (#477, @prakersh)
- **fix(callLogs)**: Support Claude format usage fields (`input_tokens`/`output_tokens`) alongside OpenAI format, include all cache token variants (#476, @prakersh)
---
## [2.8.0] — 2026-03-19
> Sprint: Bailian Coding Plan provider with editable base URLs, plus community contributions for Alibaba Cloud and Kimi Coding.
### ✨ Features
- **feat(providers)**: Added Bailian Coding Plan (`bailian-coding-plan`) — Alibaba Model Studio with Anthropic-compatible API. Static catalog of 8 models including Qwen3.5 Plus, Qwen3 Coder, MiniMax M2.5, GLM 5, and Kimi K2.5. Includes custom auth validation (400=valid, 401/403=invalid) (#467, @Mind-Dragon)
- **feat(admin)**: Editable default URL in Provider Admin create/edit flows — users can configure custom base URLs per connection. Persisted in `providerSpecificData.baseUrl` with Zod schema validation rejecting non-http(s) schemes (#467)
### 🧪 Tests
- Added 30+ unit tests and 2 e2e scenarios for Bailian Coding Plan provider covering auth validation, schema hardening, route-level behavior, and cross-layer integration
---
## [2.7.10] — 2026-03-19
> Sprint: Two new community-contributed providers (Alibaba Cloud Coding, Kimi Coding API-key) and Docker pino fix.
### ✨ Features
- **feat(providers)**: Added Alibaba Cloud Coding Plan support with two OpenAI-compatible endpoints — `alicode` (China) and `alicode-intl` (International), each with 8 models (#465, @dtk1985)
- **feat(providers)**: Added dedicated `kimi-coding-apikey` provider path — API-key-based Kimi Coding access is no longer forced through OAuth-only `kimi-coding` route. Includes registry, constants, models API, config, and validation test (#463, @Mind-Dragon)
### 🐛 Bug Fixes
- **fix(docker)**: Added missing `split2` dependency to Docker image — `pino-abstract-transport` requires it at runtime but it was not being copied into the standalone container, causing `Cannot find module 'split2'` crashes (#459)
---
## [2.7.9] — 2026-03-18
> Sprint: Codex responses subpath passthrough natively supported, Windows MITM crash fixed, and Combos agent schemas adjusted.
### ✨ Features
- **feat(codex)**: Native responses subpath passthrough for Codex — natively routes `POST /v1/responses/compact` to Codex upstream, maintaining Claude Code compatibility without stripping the `/compact` suffix (#457)
### 🐛 Bug Fixes
- **fix(combos)**: Zod schemas (`updateComboSchema` and `createComboSchema`) now include `system_message`, `tool_filter_regex`, and `context_cache_protection`. Fixes bug where agent-specific settings created via the dashboard were silently discarded by the backend validation layer (#458)
- **fix(mitm)**: Kiro MITM profile crash on Windows fixed — `node-machine-id` failed due to missing `REG.exe` env, and the fallback threw a fatal `crypto is not defined` error. Fallback now safely and correctly imports crypto (#456)
---
## [2.7.8] — 2026-03-18
> Sprint: Budget save bug + combo agent features UI + omniModel tag security fix.
### 🐛 Bug Fixes
- **fix(budget)**: "Save Limits" no longer returns 422 — `warningThreshold` is now correctly sent as fraction (01) instead of percentage (0100) (#451)
- **fix(combos)**: `<omniModel>` internal cache tag is now stripped before forwarding requests to providers, preventing cache session breaks (#454)
### ✨ Features
- **feat(combos)**: Agent Features section added to combo create/edit modal — expose `system_message` override, `tool_filter_regex`, and `context_cache_protection` directly from the dashboard (#454)
---
## [2.7.7] — 2026-03-18
> Sprint: Docker pino crash, Codex CLI responses worker fix, package-lock sync.
### 🐛 Bug Fixes
- **fix(docker)**: `pino-abstract-transport` and `pino-pretty` now explicitly copied in Docker runner stage — Next.js standalone trace misses these peer deps, causing `Cannot find module pino-abstract-transport` crash on startup (#449)
- **fix(responses)**: Remove `initTranslators()` from `/v1/responses` route — was crashing Next.js worker with `the worker has exited` uncaughtException on Codex CLI requests (#450)
### 🔧 Maintenance
- **chore(deps)**: `package-lock.json` now committed on every version bump to ensure Docker `npm ci` uses exact dependency versions
---
## [2.7.5] — 2026-03-18
> Sprint: UX improvements and Windows CLI healthcheck fix.
### 🐛 Bug Fixes
- **fix(ux)**: Show default password hint on login page — new users now see `"Default password: 123456"` below the password input (#437)
- **fix(cli)**: Claude CLI and other npm-installed tools now correctly detected as runnable on Windows — spawn uses `shell:true` to resolve `.cmd` wrappers via PATHEXT (#447)
---
## [2.7.4] — 2026-03-18
> Sprint: Search Tools dashboard, i18n fixes, Copilot limits, Serper validation fix.
### 🚀 Features
- **feat(search)**: Add Search Playground (10th endpoint), Search Tools page with Compare Providers/Rerank Pipeline/Search History, local rerank routing, auth guards on search API (#443 by @Regis-RCR)
- New route: `/dashboard/search-tools`
- Sidebar entry under Debug section
- `GET /api/search/providers` and `GET /api/search/stats` with auth guards
- Local provider_nodes routing for `/v1/rerank`
- 30+ i18n keys in search namespace
### 🐛 Bug Fixes
- **fix(search)**: Fix Brave news normalizer (was returning 0 results), enforce max_results truncation post-normalization, fix Endpoints page fetch URL (#443 by @Regis-RCR)
- **fix(analytics)**: Localize analytics day/date labels — replace hardcoded Portuguese strings with `Intl.DateTimeFormat(locale)` (#444 by @hijak)
- **fix(copilot)**: Correct GitHub Copilot account type display, filter misleading unlimited quota rows from limits dashboard (#445 by @hijak)
- **fix(providers)**: Stop rejecting valid Serper API keys — treat non-4xx responses as valid authentication (#446 by @hijak)
---
## [2.7.3] — 2026-03-18
> Sprint: Codex direct API quota fallback fix.
### 🐛 Bug Fixes
- **fix(codex)**: Block weekly-exhausted accounts in direct API fallback (#440)
- `resolveQuotaWindow()` prefix matching: `"weekly"` now matches `"weekly (7d)"` cache keys
- `applyCodexWindowPolicy()` enforces `useWeekly`/`use5h` toggles correctly
- 4 new regression tests (766 total)
---
## [2.7.2] — 2026-03-18
> Sprint: Light mode UI contrast fixes.
### 🐛 Bug Fixes
- **fix(logs)**: Fix light mode contrast in request logs filter buttons and combo badge (#378)
- Error/Success/Combo filter buttons now readable in light mode
- Combo row badge uses stronger violet in light mode
---
## [2.7.1] — 2026-03-17
> Sprint: Unified web search routing (POST /v1/search) with 5 providers + Next.js 16.1.7 security fixes (6 CVEs).
### ✨ New Features
- **feat(search)**: Unified web search routing — `POST /v1/search` with 5 providers (Serper, Brave, Perplexity, Exa, Tavily)
- Auto-failover across providers, 6,500+ free searches/month
- In-memory cache with request coalescing (configurable TTL)
- Dashboard: Search Analytics tab in `/dashboard/analytics` with provider breakdown, cache hit rate, cost tracking
- New API: `GET /api/v1/search/analytics` for search request statistics
- DB migration: `request_type` column on `call_logs` for non-chat request tracking
- Zod validation (`v1SearchSchema`), auth-gated, cost recorded via `recordCost()`
### 🔒 Security
- **deps**: Next.js 16.1.6 → 16.1.7 — fixes 6 CVEs:
- **Critical**: CVE-2026-29057 (HTTP request smuggling via http-proxy)
- **High**: CVE-2026-27977, CVE-2026-27978 (WebSocket + Server Actions)
- **Medium**: CVE-2026-27979, CVE-2026-27980, CVE-2026-jcc7
### 📁 New Files
| File | Purpose |
| ---------------------------------------------------------------- | ------------------------------------------ |
| `open-sse/handlers/search.ts` | Search handler with 5-provider routing |
| `open-sse/config/searchRegistry.ts` | Provider registry (auth, cost, quota, TTL) |
| `open-sse/services/searchCache.ts` | In-memory cache with request coalescing |
| `src/app/api/v1/search/route.ts` | Next.js route (POST + GET) |
| `src/app/api/v1/search/analytics/route.ts` | Search stats API |
| `src/app/(dashboard)/dashboard/analytics/SearchAnalyticsTab.tsx` | Analytics dashboard tab |
| `src/lib/db/migrations/007_search_request_type.sql` | DB migration |
| `tests/unit/search-registry.test.mjs` | 277 lines of unit tests |
---
## [2.7.0] — 2026-03-17
> Sprint: ClawRouter-inspired features — toolCalling flag, multilingual intent detection, benchmark-driven fallback, request deduplication, pluggable RouterStrategy, Grok-4 Fast + GLM-5 + MiniMax M2.5 + Kimi K2.5 pricing.
### ✨ New Models & Pricing
- **feat(pricing)**: xAI Grok-4 Fast — `$0.20/$0.50 per 1M tokens`, 1143ms p50 latency, tool calling supported
- **feat(pricing)**: xAI Grok-4 (standard) — `$0.20/$1.50 per 1M tokens`, reasoning flagship
- **feat(pricing)**: GLM-5 via Z.AI — `$0.5/1M`, 128K output context
- **feat(pricing)**: MiniMax M2.5 — `$0.30/1M input`, reasoning + agentic tasks
- **feat(pricing)**: DeepSeek V3.2 — updated pricing `$0.27/$1.10 per 1M`
- **feat(pricing)**: Kimi K2.5 via Moonshot API — direct Moonshot API access
- **feat(providers)**: Z.AI provider added (`zai` alias) — GLM-5 family with 128K output
### 🧠 Routing Intelligence
- **feat(registry)**: `toolCalling` flag per model in provider registry — combos can now prefer/require tool-calling capable models
- **feat(scoring)**: Multilingual intent detection for AutoCombo scoring — PT/ZH/ES/AR script/language patterns influence model selection per request context
- **feat(fallback)**: Benchmark-driven fallback chains — real latency data (p50 from `comboMetrics`) used to re-order fallback priority dynamically
- **feat(dedup)**: Request deduplication via content-hash — 5-second idempotency window prevents duplicate provider calls from retrying clients
- **feat(router)**: Pluggable `RouterStrategy` interface in `autoCombo/routerStrategy.ts` — custom routing logic can be injected without modifying core
### 🔧 MCP Server Improvements
- **feat(mcp)**: 2 new advanced tool schemas: `omniroute_get_provider_metrics` (p50/p95/p99 per provider) and `omniroute_explain_route` (routing decision explanation)
- **feat(mcp)**: MCP tool auth scopes updated — `metrics:read` scope added for provider metrics tools
- **feat(mcp)**: `omniroute_best_combo_for_task` now accepts `languageHint` parameter for multilingual routing
### 📊 Observability
- **feat(metrics)**: `comboMetrics.ts` extended with real-time latency percentile tracking per provider/account
- **feat(health)**: Health API (`/api/monitoring/health`) now returns per-provider `p50Latency` and `errorRate` fields
- **feat(usage)**: Usage history migration for per-model latency tracking
### 🗄️ DB Migrations
- **feat(migrations)**: New column `latency_p50` in `combo_metrics` table — zero-breaking, safe for existing users
### 🐛 Bug Fixes / Closures
- **close(#411)**: better-sqlite3 hashed module resolution on Windows — fixed in v2.6.10 (f02c5b5)
- **close(#409)**: GitHub Copilot chat completions fail with Claude models when files attached — fixed in v2.6.9 (838f1d6)
- **close(#405)**: Duplicate of #411 — resolved
## [2.6.10] — 2026-03-17
> Windows fix: better-sqlite3 prebuilt download without node-gyp/Python/MSVC (#426).
### 🐛 Bug Fixes
- **fix(install/#426)**: On Windows, `npm install -g omniroute` used to fail with `better_sqlite3.node is not a valid Win32 application` because the bundled native binary was compiled for Linux. Adds **Strategy 1.5** to `scripts/postinstall.mjs`: uses `@mapbox/node-pre-gyp install --fallback-to-build=false` (bundled within `better-sqlite3`) to download the correct prebuilt binary for the current OS/arch without requiring any build tools (no node-gyp, no Python, no MSVC). Falls back to `npm rebuild` only if the download fails. Adds platform-specific error messages with clear manual fix instructions.
---
## [2.6.9] — 2026-03-17
> CI fixes (t11 any-budget), bug fix #409 (file attachments via Copilot+Claude), release workflow correction.
### 🐛 Bug Fixes
- **fix(ci)**: Remove word "any" from comments in `openai-responses.ts` and `chatCore.ts` that were failing the t11 `\bany\b` budget check (false positive from regex counting comments)
- **fix(chatCore)**: Normalize unsupported content part types before forwarding to providers (#409 — Cursor sends `{type:"file"}` when `.md` files are attached; Copilot and other OpenAI-compat providers reject with "type has to be either 'image_url' or 'text'"; fix converts `file`/`document` blocks to `text` and drops unknown types)
### 🔧 Workflow
- **chore(generate-release)**: Add ATOMIC COMMIT RULE — version bump (`npm version patch`) MUST happen before committing feature files to ensure tag always points to a commit containing all version changes together
---
## [2.6.8] — 2026-03-17
> Sprint: Combo as Agent (system prompt + tool filter), Context Caching Protection, Auto-Update, Detailed Logs, MITM Kiro IDE.
### 🗄️ DB Migrations (zero-breaking — safe for existing users)
- **005_combo_agent_fields.sql**: `ALTER TABLE combos ADD COLUMN system_message TEXT DEFAULT NULL`, `tool_filter_regex TEXT DEFAULT NULL`, `context_cache_protection INTEGER DEFAULT 0`
- **006_detailed_request_logs.sql**: New `request_detail_logs` table with 500-entry ring-buffer trigger, opt-in via settings toggle
### ✨ Features
- **feat(combo)**: System Message Override per Combo (#399`system_message` field replaces or injects system prompt before forwarding to provider)
- **feat(combo)**: Tool Filter Regex per Combo (#399`tool_filter_regex` keeps only tools matching pattern; supports OpenAI + Anthropic formats)
- **feat(combo)**: Context Caching Protection (#401`context_cache_protection` tags responses with `<omniModel>provider/model</omniModel>` and pins model for session continuity)
- **feat(settings)**: Auto-Update via Settings (#320`GET /api/system/version` + `POST /api/system/update` — checks npm registry and updates in background with pm2 restart)
- **feat(logs)**: Detailed Request Logs (#378 — captures full pipeline bodies at 4 stages: client request, translated request, provider response, client response — opt-in toggle, 64KB trim, 500-entry ring-buffer)
- **feat(mitm)**: MITM Kiro IDE profile (#336`src/mitm/targets/kiro.ts` targets api.anthropic.com, reuses existing MITM infrastructure)
---
## [2.6.7] — 2026-03-17
> Sprint: SSE improvements, local provider_nodes extensions, proxy registry, Claude passthrough fixes.
### ✨ Features
- **feat(health)**: Background health check for local `provider_nodes` with exponential backoff (30s→300s) and `Promise.allSettled` to avoid blocking (#423, @Regis-RCR)
- **feat(embeddings)**: Route `/v1/embeddings` to local `provider_nodes``buildDynamicEmbeddingProvider()` with hostname validation (#422, @Regis-RCR)
- **feat(audio)**: Route TTS/STT to local `provider_nodes``buildDynamicAudioProvider()` with SSRF protection (#416, @Regis-RCR)
- **feat(proxy)**: Proxy registry, management APIs, and quota-limit generalization (#429, @Regis-RCR)
### 🐛 Bug Fixes
- **fix(sse)**: Strip Claude-specific fields (`metadata`, `anthropic_version`) when target is OpenAI-compat (#421, @prakersh)
- **fix(sse)**: Extract Claude SSE usage (`input_tokens`, `output_tokens`, cache tokens) in passthrough stream mode (#420, @prakersh)
- **fix(sse)**: Generate fallback `call_id` for tool calls with missing/empty IDs (#419, @prakersh)
- **fix(sse)**: Claude-to-Claude passthrough — forward body completely untouched, no re-translation (#418, @prakersh)
- **fix(sse)**: Filter orphaned `tool_result` items after Claude Code context compaction to avoid 400 errors (#417, @prakersh)
- **fix(sse)**: Skip empty-name tool calls in Responses API translator to prevent `placeholder_tool` infinite loops (#415, @prakersh)
- **fix(sse)**: Strip empty text content blocks before translation (#427, @prakersh)
- **fix(api)**: Add `refreshable: true` to Claude OAuth test config (#428, @prakersh)
### 📦 Dependencies
- Bump `vitest`, `@vitest/*` and related devDependencies (#414, @dependabot)
---
## [2.6.6] — 2026-03-17
> Hotfix: Turbopack/Docker compatibility — remove `node:` protocol from all `src/` imports.
+5
View File
@@ -32,6 +32,11 @@ COPY --from=builder /app/.next/static ./.next/static
COPY --from=builder /app/.next/standalone ./
# Explicitly copy @swc/helpers — not always traced by standalone output but needed at runtime
COPY --from=builder /app/node_modules/@swc/helpers ./node_modules/@swc/helpers
# Explicitly copy pino transport dependencies — pino spawns a worker that requires
# pino-abstract-transport at runtime; Next.js standalone trace does not capture it (#449)
COPY --from=builder /app/node_modules/pino-abstract-transport ./node_modules/pino-abstract-transport
COPY --from=builder /app/node_modules/pino-pretty ./node_modules/pino-pretty
COPY --from=builder /app/node_modules/split2 ./node_modules/split2
COPY --from=builder /app/scripts/run-standalone.mjs ./run-standalone.mjs
COPY --from=builder /app/scripts/runtime-env.mjs ./runtime-env.mjs
COPY --from=builder /app/scripts/bootstrap-env.mjs ./bootstrap-env.mjs
+63 -32
View File
@@ -4,7 +4,7 @@
_Your universal API proxy — one endpoint, 44+ providers, zero downtime. Now with **MCP & A2A** agent orchestration._
**Chat Completions • Embeddings • Image Generation • Video • Music • Audio • Reranking • MCP Server • A2A Protocol • 100% TypeScript**
**Chat Completions • Embeddings • Image Generation • Video • Music • Audio • Reranking • **Web Search** MCP Server • A2A Protocol • 100% TypeScript**
---
@@ -898,27 +898,44 @@ When minimized, OmniRoute lives in your system tray with quick actions:
## 💰 Pricing at a Glance
| Tier | Provider | Cost | Quota Reset | Best For |
| ------------------- | ----------------- | ---------------------- | ---------------- | ----------------------- |
| **💳 SUBSCRIPTION** | Claude Code (Pro) | $20/mo | 5h + weekly | Already subscribed |
| | Codex (Plus/Pro) | $20-200/mo | 5h + weekly | OpenAI users |
| | Gemini CLI | **FREE** | 180K/mo + 1K/day | Everyone! |
| | GitHub Copilot | $10-19/mo | Monthly | GitHub users |
| **🔑 API KEY** | NVIDIA NIM | **FREE** (dev forever) | ~40 RPM | 70+ open models |
| | Cerebras | **FREE** (1M tok/day) | 60K TPM / 30 RPM | World's fastest |
| | Groq | **FREE** (30 RPM) | 14.4K RPD | Ultra-fast Llama/Gemma |
| | DeepSeek | Pay-per-use | None | Best price/quality |
| | xAI (Grok) | Pay-per-use | None | Grok models |
| | Mistral | Free trial + paid | Rate limited | European AI |
| | OpenRouter | Pay-per-use | None | 100+ models aggr. |
| **💰 CHEAP** | GLM-4.7 | $0.6/1M | Daily 10AM | Budget backup |
| | MiniMax M2.1 | $0.2/1M | 5-hour rolling | Cheapest option |
| | Kimi K2 | $9/mo flat | 10M tokens/mo | Predictable cost |
| **🆓 FREE** | iFlow | **$0** | Unlimited | 5 models unlimited |
| | Qwen | **$0** | Unlimited | 4 models unlimited |
| | Kiro | **$0** | Unlimited | Claude (AWS Builder ID) |
| Tier | Provider | Cost | Quota Reset | Best For |
| ------------------- | --------------------------- | ------------------------- | ---------------- | --------------------------------- |
| **💳 SUBSCRIPTION** | Claude Code (Pro) | $20/mo | 5h + weekly | Already subscribed |
| | Codex (Plus/Pro) | $20-200/mo | 5h + weekly | OpenAI users |
| | Gemini CLI | **FREE** | 180K/mo + 1K/day | Everyone! |
| | GitHub Copilot | $10-19/mo | Monthly | GitHub users |
| **🔑 API KEY** | NVIDIA NIM | **FREE** (dev forever) | ~40 RPM | 70+ open models |
| | Cerebras | **FREE** (1M tok/day) | 60K TPM / 30 RPM | World's fastest |
| | Groq | **FREE** (30 RPM) | 14.4K RPD | Ultra-fast Llama/Gemma |
| | DeepSeek V3.2 | $0.27/$1.10 per 1M | None | Best price/quality reasoning |
| | xAI Grok-4 Fast | **$0.20/$0.50 per 1M** 🆕 | None | Fastest + tool calling, ultralow |
| | xAI Grok-4 (standard) | $0.20/$1.50 per 1M 🆕 | None | Reasoning flagship from xAI |
| | Mistral | Free trial + paid | Rate limited | European AI |
| | OpenRouter | Pay-per-use | None | 100+ models aggr. |
| **💰 CHEAP** | GLM-5 (via Z.AI) 🆕 | $0.5/1M | Daily 10AM | 128K output, newest flagship |
| | GLM-4.7 | $0.6/1M | Daily 10AM | Budget backup |
| | MiniMax M2.5 🆕 | $0.3/1M input | 5-hour rolling | Reasoning + agentic tasks |
| | MiniMax M2.1 | $0.2/1M | 5-hour rolling | Cheapest option |
| | Kimi K2.5 (Moonshot API) 🆕 | Pay-per-use | None | Direct Moonshot API access |
| | Kimi K2 | $9/mo flat | 10M tokens/mo | Predictable cost |
| **🆓 FREE** | iFlow | **$0** | Unlimited | 5 models unlimited |
| | Qwen | **$0** | Unlimited | 4 models unlimited |
| | Kiro | **$0** | Unlimited | Claude Sonnet/Haiku (AWS Builder) |
**💡 $0 Combo Stack:** Gemini CLI (180K/mo) → iFlow (unlimited: kimi-k2-thinking, qwen3-coder-plus, deepseek-r1) → Kiro (Claude for free) → Qwen (4 models, unlimited) — **Zero cost, never stops coding.** When Gemini quota runs out, OmniRoute auto-falls back to iFlow or Kiro with zero config.
> 🆕 **New models added (Mar 2026):** Grok-4 Fast family at $0.20/$0.50/M (benchmarked at 1143ms — 30% faster than Gemini 2.5 Flash), GLM-5 via Z.AI with 128K output, MiniMax M2.5 reasoning, DeepSeek V3.2 updated pricing, Kimi K2.5 via Moonshot direct API.
**💡 $0 Combo Stack — The Complete Free Setup:**
```
Gemini CLI (180K/mo free)
→ iFlow (unlimited: kimi-k2-thinking, qwen3-coder-plus, deepseek-r1)
→ Kiro (Claude Sonnet 4.5 + Haiku — unlimited, via AWS Builder ID)
→ Qwen (4 models — unlimited)
→ Groq (14.4K req/day — ultra-fast)
→ NVIDIA NIM (70+ models — 40 RPM forever)
```
**Zero cost. Never stops coding.** Configure this as one OmniRoute combo and all fallbacks happen automatically — no manual switching ever.
---
@@ -1027,7 +1044,20 @@ Then in `/dashboard/media` → **Transcription** tab: upload any audio or video
OmniRoute v2.0 is built as an operational platform, not just a relay proxy.
### 🚀 New in v2.0.9+Playground, CLI Fingerprints & ACP
### 🆕 New — ClawRouter-Inspired Improvements (Mar 2026)
| Feature | What It Does |
| ------------------------------------ | ------------------------------------------------------------------------------------------- |
| ⚡ **Grok-4 Fast Family** | xAI models at $0.20/$0.50/M — benchmarked 1143ms (30% faster than Gemini 2.5 Flash) |
| 🧠 **GLM-5 via Z.AI** | 128K output context, $0.5/1M — newest flagship from the GLM family |
| 🔮 **MiniMax M2.5** | Reasoning + agentic tasks at $0.30/1M — significant upgrade from M2.1 |
| 🎯 **toolCalling Flag per Model** | Per-model `toolCalling: true/false` in registry — AutoCombo skips non-tool-capable models |
| 🌍 **Multilingual Intent Detection** | PT/ZH/ES/AR keywords in AutoCombo scoring — better model selection for non-English content |
| 📊 **Benchmark-Driven Fallbacks** | Real p95 latency from live requests feeds combo scoring — AutoCombo learns from actual data |
| 🔁 **Request Deduplication** | Content-hash based dedup window — multi-agent safe, prevents duplicate charges |
| 🔌 **Pluggable RouterStrategy** | Extensible `RouterStrategy` interface — add custom routing logic as plugins |
### 🚀 Previous v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
| ------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
@@ -1075,16 +1105,17 @@ OmniRoute v2.0 is built as an operational platform, not just a relay proxy.
### 🎵 Multi-Modal APIs
| Feature | What It Does |
| -------------------------- | ------------------------------------------------------------- |
| 🖼️ **Image Generation** | `/v1/images/generations` with cloud and local backends |
| 📐 **Embeddings** | `/v1/embeddings` for search and RAG pipelines |
| 🎤 **Audio Transcription** | `/v1/audio/transcriptions` (Whisper and additional providers) |
| 🔊 **Text-to-Speech** | `/v1/audio/speech` (multiple engines/providers) |
| 🎬 **Video Generation** | `/v1/videos/generations` (ComfyUI + SD WebUI workflows) |
| 🎵 **Music Generation** | `/v1/music/generations` (ComfyUI workflows) |
| 🛡️ **Moderations** | `/v1/moderations` safety checks |
| 🔀 **Reranking** | `/v1/rerank` for relevance scoring |
| Feature | What It Does |
| -------------------------- | ------------------------------------------------------------------------------------------------------------ |
| 🖼️ **Image Generation** | `/v1/images/generations` with cloud and local backends |
| 📐 **Embeddings** | `/v1/embeddings` for search and RAG pipelines |
| 🎤 **Audio Transcription** | `/v1/audio/transcriptions` (Whisper and additional providers) |
| 🔊 **Text-to-Speech** | `/v1/audio/speech` (multiple engines/providers) |
| 🎬 **Video Generation** | `/v1/videos/generations` (ComfyUI + SD WebUI workflows) |
| 🎵 **Music Generation** | `/v1/music/generations` (ComfyUI workflows) |
| 🛡️ **Moderations** | `/v1/moderations` safety checks |
| 🔀 **Reranking** | `/v1/rerank` for relevance scoring |
| 🔍 **Web Search** 🆕 | `/v1/search` — 5 providers (Serper, Brave, Perplexity, Exa, Tavily), 6,500+ free/month, auto-failover, cache |
### 🛡️ Resilience, Security & Governance
@@ -0,0 +1,46 @@
# ADR-0001: Proxy Registry + Usage Control Generalization
Date: 2026-03-17
Status: Accepted
## Context
OmniRoute sudah punya:
- Proxy assignment berbasis config-map (`global`, `providers`, `combos`, `keys`).
- Quota-aware selection khusus provider tertentu (notably `codex`).
Gap utama:
- Proxy belum menjadi aset reusable yang bisa di-manage sebagai entitas (metadata, where-used, safe delete).
- Usage policy belum konsisten lintas provider.
- Error contract API belum seragam untuk endpoint manajemen.
## Decision
1. Tambah **Proxy Registry** sebagai domain baru di DB (`proxy_registry`, `proxy_assignments`).
2. Pertahankan kompatibilitas assignment lama (fallback ke `proxyConfig` lama).
3. Resolver runtime pakai prioritas:
- account -> provider -> global (registry)
- fallback ke legacy resolver jika registry belum ada assignment
4. Wajib redaction kredensial di output list registry default.
5. Standarkan error JSON untuk endpoint manajemen proxy agar konsisten dan punya `requestId`.
## Consequences
Positif:
- Proxy reusable dan bisa dilacak pemakaiannya.
- Safe delete bisa ditegakkan (409 saat masih dipakai).
- Migrasi bertahap tanpa breaking change runtime.
Negatif:
- Ada dual-source sementara (registry + legacy config) sampai migrasi selesai.
- Butuh endpoint assignment tambahan dan pemetaan scope yang konsisten.
## Follow-up
- Migrasi UI provider/account dari input raw proxy ke selector registry.
- Tambah health telemetry per proxy dan alerting.
- Generalisasi usage control ke provider lain melalui interface policy yang sama.
@@ -0,0 +1,32 @@
# ADR-0002: Error Contract for Management Endpoints
Date: 2026-03-17
Status: Accepted
## Decision
Management endpoints (proxy config, proxy registry, and proxy assignments) return a uniform error body:
```json
{
"error": {
"message": "Human-readable summary",
"type": "invalid_request | not_found | conflict | server_error",
"details": {}
},
"requestId": "uuid"
}
```
## Status Mapping
- 400: invalid request / validation failure
- 404: resource not found
- 409: resource conflict (for example, proxy still assigned)
- 500: unexpected server error
## Notes
- `requestId` is mandatory for log correlation.
- `details` is optional and only used for safe validation details.
- Sensitive secrets (proxy credentials, tokens) must never appear in `message` or `details`.
@@ -0,0 +1,16 @@
# ADR-0003: Security Checklist for Proxy Registry and Usage Controls
Date: 2026-03-17
Status: Accepted
## Checklist
- Validate all management payloads with Zod.
- Reject malformed scope assignment updates with status 400.
- Reject deleting an in-use proxy with status 409 unless forced.
- Never expose proxy username/password in list responses by default.
- Never log raw credentials or token values.
- Keep error responses free from internal stack traces.
- Protect management endpoints with existing auth middleware policy.
- Audit mutating operations: create/update/delete/assign/migrate.
- Ensure resolver fallback to legacy config while migration is in transition.
+10
View File
@@ -8,6 +8,16 @@ _وكيل API العالمي الخاص بك - نقطة نهاية واحدة،
---
### 🆕 الجديد في v2.7.0
- **RouterStrategy قابل للتوصيل** — استراتيجيات القواعد والتكلفة والكمون
- **كشف النية متعدد اللغات** — تسجيل التوجيه بأكثر من 30 لغة
- **إلغاء تكرار الطلبات** — تجنب مكالمات API المكررة عبر تجزئة المحتوى
- **مزودون جدد:** Grok-4 Fast (xAI) وGLM-5 / Z.AI وMiniMax M2.5 وKimi K2.5
- **أسعار محدثة:** Grok-4 Fast $0.20/$0.50/M، GLM-5 $0.50/M، MiniMax M2.5 $0.30/M
---
<div align="center">
[![إصدار npm](https://img.shields.io/npm/v/omniroute?color=cb3837&logo=npm)](https://www.npmjs.com/package/omniroute)
+10
View File
@@ -8,6 +8,16 @@ _Вашият универсален API прокси — една крайна
---
### 🆕 What's New in v2.7.0
- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
- **Multilingual intent detection** — routing scoring in 30+ languages
- **Request deduplication** — prevent duplicate API calls via content hash
- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
<div align="center">
[![npm версия](https://img.shields.io/npm/v/omniroute?color=cb3837&logo=npm)](https://www.npmjs.com/package/omniroute)
+10
View File
@@ -8,6 +8,16 @@ _Din universelle API-proxy — ét slutpunkt, 36+ udbydere, ingen nedetid. Nu me
---
### 🆕 What's New in v2.7.0
- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
- **Multilingual intent detection** — routing scoring in 30+ languages
- **Request deduplication** — prevent duplicate API calls via content hash
- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
<div align="center">
[![npm version](https://img.shields.io/npm/v/omniroute?color=cb3837&logo=npm)](https://www.npmjs.com/package/omniroute)
+10
View File
@@ -8,6 +8,16 @@ _Ihr universeller API-Proxy ein Endpunkt, mehr als 36 Anbieter, keine Ausfal
---
### 🆕 Neu in v2.7.0
- **Erweiterbare RouterStrategy** — Regeln-, Kosten- und Latenzstrategien
- **Mehrsprachige Absichtserkennung** — Routing-Scoring in 30+ Sprachen
- **Anfrage-Deduplizierung** — doppelte API-Aufrufe per Content-Hash vermeiden
- **Neue Anbieter:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Aktualisierte Preise:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
<div align="center">
[![npm-Version](https://img.shields.io/npm/v/omniroute?color=cb3837&logo=npm)](https://www.npmjs.com/package/omniroute)
+10
View File
@@ -11,6 +11,16 @@ _Tu proxy de API universal — un endpoint, 36+ proveedores, cero tiempo de inac
---
### 🆕 Novedades en v2.7.0
- **RouterStrategy enchufable** — estrategias de reglas, costo y latencia
- **Detección de intención multilingüe** — puntuación de enrutamiento en 30+ idiomas
- **Deduplicación de solicitudes** — evita llamadas duplicadas por hash de contenido
- **Nuevos proveedores:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Precios actualizados:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _Universaali API-välityspalvelin yksi päätepiste, yli 36 palveluntarjoaja
---
### 🆕 What's New in v2.7.0
- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
- **Multilingual intent detection** — routing scoring in 30+ languages
- **Request deduplication** — prevent duplicate API calls via content hash
- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _Votre proxy API universel — un endpoint, 36+ fournisseurs, zéro temps d'arr
---
### 🆕 Nouveautés dans v2.7.0
- **RouterStrategy extensible** — stratégies de règles, coût et latence
- **Détection d'intention multilingue** — scoring de routage en 30+ langues
- **Déduplication des requêtes** — évite les appels dupliqués via hash de contenu
- **Nouveaux fournisseurs :** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Tarifs mis à jour :** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _שרת ה-API האוניברסלי שלך - נקודת קצה אחת, 36+ ספ
---
### 🆕 What's New in v2.7.0
- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
- **Multilingual intent detection** — routing scoring in 30+ languages
- **Request deduplication** — prevent duplicate API calls via content hash
- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _Az univerzális API-proxy egy végpont, 36+ szolgáltató, nulla állásid
---
### 🆕 What's New in v2.7.0
- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
- **Multilingual intent detection** — routing scoring in 30+ languages
- **Request deduplication** — prevent duplicate API calls via content hash
- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _Proksi API universal Anda — satu titik akhir, 36+ penyedia, tanpa waktu henti
---
### 🆕 What's New in v2.7.0
- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
- **Multilingual intent detection** — routing scoring in 30+ languages
- **Request deduplication** — prevent duplicate API calls via content hash
- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -13,6 +13,16 @@ _आपका सार्वभौमिक एपीआई प्रॉक्
---
### 🆕 What's New in v2.7.0
- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
- **Multilingual intent detection** — routing scoring in 30+ languages
- **Request deduplication** — prevent duplicate API calls via content hash
- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _Il tuo proxy API universale — un endpoint, 36+ provider, zero downtime._
---
### 🆕 Novità in v2.7.0
- **RouterStrategy estensibile** — strategie per regole, costo e latenza
- **Rilevamento intento multilingue** — scoring di routing in 30+ lingue
- **Deduplicazione richieste** — evita chiamate duplicate tramite hash del contenuto
- **Nuovi provider:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Prezzi aggiornati:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _ユニバーサル API プロキシ — 1 つのエンドポイント、36 以
---
### 🆕 v2.7.0 の新機能
- **プラガブル RouterStrategy** — ルール・コスト・レイテンシ戦略をサポート
- **多言語インテント検出** — 30以上の言語でルーティングスコアリング
- **リクエスト重複排除** — コンテンツハッシュで重複 API 呼び出しを防止
- **新しいプロバイダー:** Grok-4 Fast (xAI)、GLM-5 / Z.AI、MiniMax M2.5、Kimi K2.5
- **価格更新:** Grok-4 Fast $0.20/$0.50/M、GLM-5 $0.50/M、MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _범용 API 프록시 — 하나의 엔드포인트, 36개 이상의 공급자,
---
### 🆕 v2.7.0 새로운 기능
- **플러그형 RouterStrategy** — 규칙, 비용, 지연 전략 지원
- **다국어 의도 감지** — 30개 이상 언어로 라우팅 스코어링
- **요청 중복 제거** — 콘텐츠 해시로 중복 API 호출 방지
- **새 공급자:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **가격 업데이트:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _Proksi API universal anda — satu titik akhir, 36+ pembekal, masa henti sifar.
---
### 🆕 What's New in v2.7.0
- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
- **Multilingual intent detection** — routing scoring in 30+ languages
- **Request deduplication** — prevent duplicate API calls via content hash
- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _Uw universele API-proxy: één eindpunt, meer dan 36 providers, geen downtime._
---
### 🆕 What's New in v2.7.0
- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
- **Multilingual intent detection** — routing scoring in 30+ languages
- **Request deduplication** — prevent duplicate API calls via content hash
- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _Din universelle API-proxy ett endepunkt, 36+ leverandører, null nedetid._
---
### 🆕 What's New in v2.7.0
- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
- **Multilingual intent detection** — routing scoring in 30+ languages
- **Request deduplication** — prevent duplicate API calls via content hash
- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _Iyong unibersal na API proxy — isang endpoint, 36+ provider, zero downtime._
---
### 🆕 What's New in v2.7.0
- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
- **Multilingual intent detection** — routing scoring in 30+ languages
- **Request deduplication** — prevent duplicate API calls via content hash
- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _Twój uniwersalny serwer proxy API — jeden punkt końcowy, ponad 36 dostawcó
---
### 🆕 What's New in v2.7.0
- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
- **Multilingual intent detection** — routing scoring in 30+ languages
- **Request deduplication** — prevent duplicate API calls via content hash
- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _Seu proxy de API universal — um endpoint, 36+ provedores, zero tempo de inati
---
### 🆕 Novidades na v2.7.0
- **RouterStrategy plugável** — estratégias de regras, custo e latência
- **Detecção de intenção multilíngue** — scoring de roteamento em 30+ idiomas
- **Deduplicação de requisições** — evita chamadas duplicadas por hash de conteúdo
- **Novos provedores:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Preços atualizados:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _Seu proxy de API universal — um endpoint, mais de 36 provedores, tempo de ina
---
### 🆕 Novidades na v2.7.0
- **RouterStrategy extensível** — estratégias de regras, custo e latência
- **Deteção de intenção multilíngue** — scoring de encaminhamento em 30+ idiomas
- **Deduplicação de pedidos** — evita chamadas duplicadas por hash de conteúdo
- **Novos fornecedores:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Preços atualizados:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _Proxy-ul dvs. universal API - un punct final, peste 36 de furnizori, zero timpi
---
### 🆕 What's New in v2.7.0
- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
- **Multilingual intent detection** — routing scoring in 30+ languages
- **Request deduplication** — prevent duplicate API calls via content hash
- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _Ваш универсальный API-прокси — одна точка до
---
### 🆕 Новое в v2.7.0
- **Подключаемая RouterStrategy** — стратегии по правилам, стоимости и задержке
- **Многоязычное распознавание намерений** — маршрутизация на 30+ языках
- **Дедупликация запросов** — устранение дублей по хэшу содержимого
- **Новые провайдеры:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Обновлённые цены:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _Váš univerzálny proxy server API jeden koncový bod, 36+ poskytovateľov
---
### 🆕 What's New in v2.7.0
- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
- **Multilingual intent detection** — routing scoring in 30+ languages
- **Request deduplication** — prevent duplicate API calls via content hash
- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _Din universella API-proxy — en slutpunkt, 36+ leverantörer, noll driftstopp.
---
### 🆕 What's New in v2.7.0
- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
- **Multilingual intent detection** — routing scoring in 30+ languages
- **Request deduplication** — prevent duplicate API calls via content hash
- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _พร็อกซี API สากลของคุณ — จุดสิ้
---
### 🆕 What's New in v2.7.0
- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
- **Multilingual intent detection** — routing scoring in 30+ languages
- **Request deduplication** — prevent duplicate API calls via content hash
- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _Ваш універсальний API-проксі — одна кінцева
---
### 🆕 What's New in v2.7.0
- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
- **Multilingual intent detection** — routing scoring in 30+ languages
- **Request deduplication** — prevent duplicate API calls via content hash
- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _Proxy API phổ quát của bạn — một điểm cuối, hơn 36 nhà cung c
---
### 🆕 What's New in v2.7.0
- **Pluggable RouterStrategy** — rules, cost, and latency routing strategies
- **Multilingual intent detection** — routing scoring in 30+ languages
- **Request deduplication** — prevent duplicate API calls via content hash
- **New providers:** Grok-4 Fast (xAI), GLM-5 / Z.AI, MiniMax M2.5, Kimi K2.5
- **Updated pricing:** Grok-4 Fast $0.20/$0.50/M, GLM-5 $0.50/M, MiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+10
View File
@@ -11,6 +11,16 @@ _您的通用 API 代理 — 一个端点,36+ 提供商,零停机时间。_
---
### 🆕 v2.7.0 新功能
- **可插拔 RouterStrategy** — 支持规则、成本和延迟策略
- **多语言意图检测** — 支持 30+ 语言的路由评分
- **请求去重** — 基于内容哈希避免重复 API 调用
- **新增提供商:** Grok-4 Fast (xAI)、GLM-5 / Z.AI、MiniMax M2.5、Kimi K2.5
- **价格更新:** Grok-4 Fast $0.20/$0.50/MGLM-5 $0.50/MMiniMax M2.5 $0.30/M
---
### 🚀 New in v2.0.9+ — Playground, CLI Fingerprints & ACP
| Feature | What It Does |
+1 -1
View File
@@ -1,7 +1,7 @@
openapi: 3.1.0
info:
title: OmniRoute API
version: 2.6.6
version: 2.8.2
description: |
OmniRoute is a local-first AI API proxy router. It provides an OpenAI-compatible
endpoint that routes requests to multiple AI providers with load balancing,
+4
View File
@@ -121,6 +121,10 @@ const nextConfig = {
source: "/responses",
destination: "/api/v1/responses",
},
{
source: "/responses/:path*",
destination: "/api/v1/responses/:path*",
},
{
source: "/models",
destination: "/api/v1/models",
+46 -8
View File
@@ -11,7 +11,7 @@ interface AudioModel {
name: string;
}
interface AudioProvider {
export interface AudioProvider {
id: string;
baseUrl: string;
authType: string;
@@ -262,36 +262,74 @@ export function getSpeechProvider(providerId: string): AudioProvider | null {
return AUDIO_SPEECH_PROVIDERS[providerId] || null;
}
export interface ProviderNodeRow {
prefix: string;
name: string;
baseUrl: string;
apiType?: string;
}
/**
* Parse audio model string (format: "provider/model" or just "model")
* Build a dynamic AudioProvider from a provider_node DB entry.
* Only used for local providers (localhost/127.0.0.1) remote nodes are
* excluded by the caller to prevent auth bypass and SSRF.
*/
export function buildDynamicAudioProvider(node: ProviderNodeRow, audioPath: string): AudioProvider {
if (!node.prefix || !node.baseUrl) {
throw new Error(`Invalid provider_node: missing prefix or baseUrl`);
}
const baseUrl = node.baseUrl.replace(/\/+$/, "");
return {
id: node.prefix,
baseUrl: `${baseUrl}${audioPath}`,
authType: "none",
authHeader: "none",
models: [],
};
}
function parseAudioModel(
modelStr: string | null,
registry: Record<string, AudioProvider>
registry: Record<string, AudioProvider>,
dynamicProviders?: AudioProvider[]
): { provider: string | null; model: string | null } {
if (!modelStr) return { provider: null, model: null };
for (const [providerId, config] of Object.entries(registry)) {
// Phase 1: prefix match in hardcoded registry
for (const [providerId] of Object.entries(registry)) {
if (modelStr.startsWith(providerId + "/")) {
return { provider: providerId, model: modelStr.slice(providerId.length + 1) };
}
}
// Phase 2: bare model lookup in hardcoded registry
for (const [providerId, config] of Object.entries(registry)) {
if (config.models.some((m) => m.id === modelStr)) {
return { provider: providerId, model: modelStr };
}
}
// Phase 3: prefix match in dynamic providers (provider_nodes)
if (dynamicProviders) {
for (const dp of dynamicProviders) {
if (modelStr.startsWith(dp.id + "/")) {
return { provider: dp.id, model: modelStr.slice(dp.id.length + 1) };
}
}
}
return { provider: null, model: modelStr };
}
export function parseTranscriptionModel(modelStr: string | null) {
return parseAudioModel(modelStr, AUDIO_TRANSCRIPTION_PROVIDERS);
export function parseTranscriptionModel(
modelStr: string | null,
dynamicProviders?: AudioProvider[]
) {
return parseAudioModel(modelStr, AUDIO_TRANSCRIPTION_PROVIDERS, dynamicProviders);
}
export function parseSpeechModel(modelStr: string | null) {
return parseAudioModel(modelStr, AUDIO_SPEECH_PROVIDERS);
export function parseSpeechModel(modelStr: string | null, dynamicProviders?: AudioProvider[]) {
return parseAudioModel(modelStr, AUDIO_SPEECH_PROVIDERS, dynamicProviders);
}
/**
+54 -8
View File
@@ -8,7 +8,43 @@
* keyed by provider ID (e.g. "nebius", "openai").
*/
export const EMBEDDING_PROVIDERS = {
export interface EmbeddingProvider {
id: string;
baseUrl: string;
authType: string;
authHeader: string;
models: { id: string; name: string; dimensions?: number }[];
}
export interface EmbeddingProviderNodeRow {
prefix: string;
name: string;
baseUrl: string;
apiType?: string;
}
/**
* Build a dynamic EmbeddingProvider from a local provider_node.
* Only used for local providers (localhost) caller must filter by hostname.
*/
export function buildDynamicEmbeddingProvider(node: EmbeddingProviderNodeRow): EmbeddingProvider {
if (!node.prefix || !node.baseUrl) {
throw new Error(`Invalid provider_node: missing prefix or baseUrl`);
}
if (node.prefix.includes("/") || node.prefix.includes(" ")) {
throw new Error(`Invalid provider_node prefix "${node.prefix}": must not contain / or spaces`);
}
const baseUrl = node.baseUrl.replace(/\/+$/, "");
return {
id: node.prefix,
baseUrl: `${baseUrl}/embeddings`,
authType: "none",
authHeader: "none",
models: [],
};
}
export const EMBEDDING_PROVIDERS: Record<string, EmbeddingProvider> = {
nebius: {
id: "nebius",
baseUrl: "https://api.tokenfactory.nebius.com/v1/embeddings",
@@ -70,7 +106,7 @@ export const EMBEDDING_PROVIDERS = {
/**
* Get embedding provider config by ID
*/
export function getEmbeddingProvider(providerId) {
export function getEmbeddingProvider(providerId: string): EmbeddingProvider | null {
return EMBEDDING_PROVIDERS[providerId] || null;
}
@@ -78,26 +114,36 @@ export function getEmbeddingProvider(providerId) {
* Parse embedding model string (format: "provider/model" or just "model")
* Returns { provider, model }
*/
export function parseEmbeddingModel(modelStr) {
export function parseEmbeddingModel(
modelStr: string | null,
dynamicProviders?: EmbeddingProvider[]
): { provider: string | null; model: string | null } {
if (!modelStr) return { provider: null, model: null };
// Check for "provider/model" format
const slashIdx = modelStr.indexOf("/");
if (slashIdx > 0) {
// Handle nested model IDs like "nebius/Qwen/Qwen3-Embedding-8B"
// Try each provider prefix
for (const [providerId, config] of Object.entries(EMBEDDING_PROVIDERS)) {
// Phase 1: Try each hardcoded provider prefix
for (const [providerId] of Object.entries(EMBEDDING_PROVIDERS)) {
if (modelStr.startsWith(providerId + "/")) {
return { provider: providerId, model: modelStr.slice(providerId.length + 1) };
}
}
// Fallback: first segment is provider
// Phase 2: Try dynamic provider_nodes prefix
if (dynamicProviders) {
for (const dp of dynamicProviders) {
if (modelStr.startsWith(dp.id + "/")) {
return { provider: dp.id, model: modelStr.slice(dp.id.length + 1) };
}
}
}
// Phase 3: Fallback — first segment is provider
const provider = modelStr.slice(0, slashIdx);
const model = modelStr.slice(slashIdx + 1);
return { provider, model };
}
// No provider prefix — search all providers for the model
// No provider prefix — search hardcoded providers for the model
for (const [providerId, config] of Object.entries(EMBEDDING_PROVIDERS)) {
if (config.models.some((m) => m.id === modelStr)) {
return { provider: providerId, model: modelStr };
+140 -16
View File
@@ -11,6 +11,7 @@
export interface RegistryModel {
id: string;
name: string;
toolCalling?: boolean;
targetFormat?: string;
unsupportedParams?: readonly string[];
}
@@ -77,6 +78,22 @@ interface LegacyProvider {
clientVersion?: string;
}
const KIMI_CODING_SHARED = {
format: "claude",
executor: "default",
baseUrl: "https://api.kimi.com/coding/v1/messages",
authHeader: "x-api-key",
headers: {
"Anthropic-Version": "2023-06-01",
"Anthropic-Beta": "claude-code-20250219,interleaved-thinking-2025-05-14",
},
models: [
{ id: "kimi-k2.5", name: "Kimi K2.5" },
{ id: "kimi-k2.5-thinking", name: "Kimi K2.5 Thinking" },
{ id: "kimi-latest", name: "Kimi Latest" },
] as RegistryModel[],
} as const;
// ── Registry ──────────────────────────────────────────────────────────────
export const REGISTRY: Record<string, RegistryEntry> = {
@@ -114,6 +131,7 @@ export const REGISTRY: Record<string, RegistryEntry> = {
},
models: [
{ id: "claude-opus-4-6", name: "Claude Opus 4.6" },
{ id: "claude-sonnet-4-6", name: "Claude 4.6 Sonnet" },
{ id: "claude-opus-4-5-20251101", name: "Claude 4.5 Opus" },
{ id: "claude-sonnet-4-5-20250929", name: "Claude 4.5 Sonnet" },
{ id: "claude-haiku-4-5-20251001", name: "Claude 4.5 Haiku" },
@@ -139,6 +157,9 @@ export const REGISTRY: Record<string, RegistryEntry> = {
clientSecretDefault: "",
},
models: [
{ id: "gemini-3.1-pro", name: "Gemini 3.1 Pro" },
{ id: "gemini-3-1-pro", name: "Gemini 3.1 Pro (Alt ID)" },
{ id: "gemini-3.1-pro-preview", name: "Gemini 3.1 Pro Preview" },
{ id: "gemini-2.5-pro", name: "Gemini 2.5 Pro" },
{ id: "gemini-2.5-flash", name: "Gemini 2.5 Flash" },
{ id: "gemini-2.5-flash-lite", name: "Gemini 2.5 Flash Lite" },
@@ -168,6 +189,9 @@ export const REGISTRY: Record<string, RegistryEntry> = {
clientSecretDefault: "",
},
models: [
{ id: "gemini-3.1-pro", name: "Gemini 3.1 Pro" },
{ id: "gemini-3-1-pro", name: "Gemini 3.1 Pro (Alt ID)" },
{ id: "gemini-3.1-pro-preview", name: "Gemini 3.1 Pro Preview" },
{ id: "gemini-2.5-pro", name: "Gemini 2.5 Pro" },
{ id: "gemini-2.5-flash", name: "Gemini 2.5 Flash" },
{ id: "gemini-2.5-flash-lite", name: "Gemini 2.5 Flash Lite" },
@@ -460,8 +484,13 @@ export const REGISTRY: Record<string, RegistryEntry> = {
"Anthropic-Version": "2023-06-01",
},
models: [
{ id: "claude-haiku-4.5", name: "Claude Haiku 4.5" },
{ id: "claude-sonnet-4-20250514", name: "Claude Sonnet 4" },
{ id: "claude-sonnet-4-6-20251031", name: "Claude Sonnet 4.6 (Dated)" },
{ id: "claude-sonnet-4.6", name: "Claude Sonnet 4.6" },
{ id: "claude-opus-4-20250514", name: "Claude Opus 4" },
{ id: "claude-opus-4-6-20251031", name: "Claude Opus 4.6 (Dated)" },
{ id: "claude-opus-4.6", name: "Claude Opus 4.6" },
{ id: "claude-3-5-sonnet-20241022", name: "Claude 3.5 Sonnet" },
],
},
@@ -495,6 +524,8 @@ export const REGISTRY: Record<string, RegistryEntry> = {
"Anthropic-Beta": "claude-code-20250219,interleaved-thinking-2025-05-14",
},
models: [
{ id: "glm-5", name: "GLM 5" },
{ id: "glm-5-turbo", name: "GLM 5 Turbo" },
{ id: "glm-4.7-flash", name: "GLM 4.7 Flash" },
{ id: "glm-4.7", name: "GLM 4.7" },
{ id: "glm-4.6v", name: "GLM 4.6V (Vision)" },
@@ -506,6 +537,51 @@ export const REGISTRY: Record<string, RegistryEntry> = {
],
},
"bailian-coding-plan": {
id: "bailian-coding-plan",
alias: "bcp",
format: "claude",
executor: "default",
baseUrl: "https://coding-intl.dashscope.aliyuncs.com/apps/anthropic/v1/messages",
chatPath: "/messages",
urlSuffix: "?beta=true",
authType: "apikey",
authHeader: "x-api-key",
headers: {
"Anthropic-Version": "2023-06-01",
"Anthropic-Beta": "claude-code-20250219,interleaved-thinking-2025-05-14",
},
models: [
{ id: "qwen3.5-plus", name: "Qwen3.5 Plus" },
{ id: "qwen3-max-2026-01-23", name: "Qwen3 Max (2026-01-23)" },
{ id: "qwen3-coder-next", name: "Qwen3 Coder Next" },
{ id: "qwen3-coder-plus", name: "Qwen3 Coder Plus" },
{ id: "MiniMax-M2.5", name: "MiniMax M2.5" },
{ id: "glm-5", name: "GLM 5" },
{ id: "glm-4.7", name: "GLM 4.7" },
{ id: "kimi-k2.5", name: "Kimi K2.5" },
],
},
zai: {
id: "zai",
alias: "zai",
format: "claude",
executor: "default",
baseUrl: "https://api.z.ai/api/anthropic/v1/messages",
urlSuffix: "?beta=true",
authType: "apikey",
authHeader: "x-api-key",
headers: {
"Anthropic-Version": "2023-06-01",
"Anthropic-Beta": "claude-code-20250219,interleaved-thinking-2025-05-14",
},
models: [
{ id: "glm-5", name: "GLM 5" },
{ id: "glm-5-turbo", name: "GLM 5 Turbo" },
],
},
kimi: {
id: "kimi",
alias: "kimi",
@@ -525,16 +601,9 @@ export const REGISTRY: Record<string, RegistryEntry> = {
"kimi-coding": {
id: "kimi-coding",
alias: "kmc",
format: "claude",
executor: "default",
baseUrl: "https://api.kimi.com/coding/v1/messages",
...KIMI_CODING_SHARED,
urlSuffix: "?beta=true",
authType: "oauth",
authHeader: "x-api-key",
headers: {
"Anthropic-Version": "2023-06-01",
"Anthropic-Beta": "claude-code-20250219,interleaved-thinking-2025-05-14",
},
oauth: {
clientIdEnv: "KIMI_CODING_OAUTH_CLIENT_ID",
clientIdDefault: "17e5f671-d194-4dfb-9706-5516cb48c098",
@@ -542,11 +611,13 @@ export const REGISTRY: Record<string, RegistryEntry> = {
refreshUrl: "https://auth.kimi.com/api/oauth/token",
authUrl: "https://auth.kimi.com/api/oauth/device_authorization",
},
models: [
{ id: "kimi-k2.5", name: "Kimi K2.5" },
{ id: "kimi-k2.5-thinking", name: "Kimi K2.5 Thinking" },
{ id: "kimi-latest", name: "Kimi Latest" },
],
},
"kimi-coding-apikey": {
id: "kimi-coding-apikey",
alias: "kmca",
...KIMI_CODING_SHARED,
authType: "apikey",
},
kilocode: {
@@ -637,7 +708,11 @@ export const REGISTRY: Record<string, RegistryEntry> = {
"Anthropic-Version": "2023-06-01",
"Anthropic-Beta": "claude-code-20250219,interleaved-thinking-2025-05-14",
},
models: [{ id: "MiniMax-M2.1", name: "MiniMax M2.1" }],
models: [
{ id: "minimax-m2.5", name: "MiniMax M2.5" },
{ id: "MiniMax-M2.5", name: "MiniMax M2.5 (Legacy Alias)" },
{ id: "MiniMax-M2.1", name: "MiniMax M2.1" },
],
},
"minimax-cn": {
@@ -655,10 +730,52 @@ export const REGISTRY: Record<string, RegistryEntry> = {
},
models: [
// Keep parity with minimax to ensure model discovery works for minimax-cn connections.
{ id: "minimax-m2.5", name: "MiniMax M2.5" },
{ id: "MiniMax-M2.5", name: "MiniMax M2.5 (Legacy Alias)" },
{ id: "MiniMax-M2.1", name: "MiniMax M2.1" },
],
},
alicode: {
id: "alicode",
alias: "alicode",
format: "openai",
executor: "default",
baseUrl: "https://coding.dashscope.aliyuncs.com/v1/chat/completions",
authType: "apikey",
authHeader: "bearer",
models: [
{ id: "qwen3.5-plus", name: "Qwen3.5 Plus" },
{ id: "kimi-k2.5", name: "Kimi K2.5" },
{ id: "glm-5", name: "GLM 5" },
{ id: "MiniMax-M2.5", name: "MiniMax M2.5" },
{ id: "qwen3-max-2026-01-23", name: "Qwen3 Max" },
{ id: "qwen3-coder-next", name: "Qwen3 Coder Next" },
{ id: "qwen3-coder-plus", name: "Qwen3 Coder Plus" },
{ id: "glm-4.7", name: "GLM 4.7" },
],
},
"alicode-intl": {
id: "alicode-intl",
alias: "alicode-intl",
format: "openai",
executor: "default",
baseUrl: "https://coding-intl.dashscope.aliyuncs.com/v1/chat/completions",
authType: "apikey",
authHeader: "bearer",
models: [
{ id: "qwen3.5-plus", name: "Qwen3.5 Plus" },
{ id: "kimi-k2.5", name: "Kimi K2.5" },
{ id: "glm-5", name: "GLM 5" },
{ id: "MiniMax-M2.5", name: "MiniMax M2.5" },
{ id: "qwen3-max-2026-01-23", name: "Qwen3 Max" },
{ id: "qwen3-coder-next", name: "Qwen3 Coder Next" },
{ id: "qwen3-coder-plus", name: "Qwen3 Coder Plus" },
{ id: "glm-4.7", name: "GLM 4.7" },
],
},
deepseek: {
id: "deepseek",
alias: "ds",
@@ -717,10 +834,14 @@ export const REGISTRY: Record<string, RegistryEntry> = {
authType: "apikey",
authHeader: "bearer",
models: [
{ id: "grok-4", name: "Grok 4" },
{ id: "grok-4-fast-non-reasoning", name: "Grok 4 Fast" },
{ id: "grok-4-fast-reasoning", name: "Grok 4 Fast Reasoning" },
{ id: "grok-code-fast-1", name: "Grok Code Fast" },
{ id: "grok-4-1-fast-non-reasoning", name: "Grok 4.1 Fast" },
{ id: "grok-4-1-fast-reasoning", name: "Grok 4.1 Fast Reasoning" },
{ id: "grok-4-0709", name: "Grok 4 (0709)" },
{ id: "grok-4", name: "Grok 4" },
{ id: "grok-3", name: "Grok 3" },
{ id: "grok-3-mini", name: "Grok 3 Mini" },
],
},
@@ -849,7 +970,10 @@ export const REGISTRY: Record<string, RegistryEntry> = {
authType: "apikey",
authHeader: "bearer",
models: [
{ id: "gpt-oss-120b", name: "GPT OSS 120B", toolCalling: false },
{ id: "openai/gpt-oss-120b", name: "GPT OSS 120B (OpenAI Prefix)", toolCalling: false },
{ id: "meta/llama-3.3-70b-instruct", name: "Llama 3.3 70B" },
{ id: "nvidia/llama-3.3-70b-instruct", name: "Llama 3.3 70B (NVIDIA Prefix)" },
{ id: "meta/llama-4-maverick-17b-128e-instruct", name: "Llama 4 Maverick" },
{ id: "moonshotai/kimi-k2.5", name: "Kimi K2.5" },
{ id: "z-ai/glm4.7", name: "GLM 4.7" },
+155
View File
@@ -0,0 +1,155 @@
/**
* Search Provider Registry
*
* Defines providers that support the /v1/search endpoint.
* Unlike LLM/embedding providers, search providers don't have "models"
* a provider IS the model (Serper = Google SERP, Brave = Brave index).
*
* API keys are stored in the same provider credentials system,
* keyed by provider ID (e.g. "serper-search", "brave-search").
* perplexity-search reuses credentials from the "perplexity" chat provider.
*/
export interface SearchProviderConfig {
id: string;
name: string;
baseUrl: string;
method: "GET" | "POST";
authType: "apikey";
authHeader: string;
costPerQuery: number;
freeMonthlyQuota: number;
searchTypes: string[];
defaultMaxResults: number;
maxMaxResults: number;
timeoutMs: number;
cacheTTLMs: number;
}
export const SEARCH_PROVIDERS: Record<string, SearchProviderConfig> = {
"serper-search": {
id: "serper-search",
name: "Serper Search",
baseUrl: "https://google.serper.dev",
method: "POST",
authType: "apikey",
authHeader: "x-api-key",
costPerQuery: 0.001,
freeMonthlyQuota: 2500,
searchTypes: ["web", "news"],
defaultMaxResults: 5,
maxMaxResults: 100,
timeoutMs: 10_000,
cacheTTLMs: 5 * 60 * 1000,
},
"brave-search": {
id: "brave-search",
name: "Brave Search",
baseUrl: "https://api.search.brave.com/res/v1",
method: "GET",
authType: "apikey",
authHeader: "x-subscription-token",
costPerQuery: 0.005,
freeMonthlyQuota: 1000,
searchTypes: ["web", "news"],
defaultMaxResults: 5,
maxMaxResults: 20,
timeoutMs: 10_000,
cacheTTLMs: 5 * 60 * 1000,
},
"perplexity-search": {
id: "perplexity-search",
name: "Perplexity Search",
baseUrl: "https://api.perplexity.ai/search",
method: "POST",
authType: "apikey",
authHeader: "bearer",
costPerQuery: 0.005,
freeMonthlyQuota: 0,
searchTypes: ["web"],
defaultMaxResults: 5,
maxMaxResults: 20,
timeoutMs: 10_000,
cacheTTLMs: 5 * 60 * 1000,
},
"exa-search": {
id: "exa-search",
name: "Exa Search",
baseUrl: "https://api.exa.ai/search",
method: "POST",
authType: "apikey",
authHeader: "x-api-key",
costPerQuery: 0.007,
freeMonthlyQuota: 1000,
searchTypes: ["web", "news"],
defaultMaxResults: 5,
maxMaxResults: 100,
timeoutMs: 10_000,
cacheTTLMs: 5 * 60 * 1000,
},
"tavily-search": {
id: "tavily-search",
name: "Tavily Search",
baseUrl: "https://api.tavily.com/search",
method: "POST",
authType: "apikey",
authHeader: "bearer",
costPerQuery: 0.008,
freeMonthlyQuota: 1000,
searchTypes: ["web", "news"],
defaultMaxResults: 5,
maxMaxResults: 20,
timeoutMs: 10_000,
cacheTTLMs: 5 * 60 * 1000,
},
};
/**
* Credential fallback mapping search providers that can reuse credentials
* from a related provider (e.g., perplexity-search uses the same API key as perplexity chat).
*/
export const SEARCH_CREDENTIAL_FALLBACKS: Record<string, string> = {
"perplexity-search": "perplexity",
};
/**
* Get search provider config by ID
*/
export function getSearchProvider(providerId: string): SearchProviderConfig | null {
return SEARCH_PROVIDERS[providerId] || null;
}
/**
* Get all search providers as a flat list
*/
export function getAllSearchProviders(): Array<{
id: string;
name: string;
searchTypes: string[];
}> {
return Object.values(SEARCH_PROVIDERS).map((p) => ({
id: p.id,
name: p.name,
searchTypes: p.searchTypes,
}));
}
/**
* Select the cheapest available provider.
* If an explicit provider is given, validate and return it.
* Otherwise, return the cheapest by costPerQuery.
*/
export function selectProvider(explicitProvider?: string): SearchProviderConfig | null {
if (explicitProvider) {
return SEARCH_PROVIDERS[explicitProvider] || null;
}
const providers = Object.values(SEARCH_PROVIDERS);
if (providers.length === 0) return null;
return providers.reduce((cheapest, p) => (p.costPerQuery < cheapest.costPerQuery ? p : cheapest));
}
+1
View File
@@ -26,6 +26,7 @@ export type ProviderCredentials = {
expiresAt?: string;
connectionId?: string; // T07: used for API key rotation index
providerSpecificData?: JsonRecord;
requestEndpointPath?: string;
};
export type ExecutorLog = {
+38 -3
View File
@@ -9,6 +9,17 @@ type EffortLevel = (typeof EFFORT_ORDER)[number];
const CODEX_FAST_WIRE_VALUE = "priority";
let defaultFastServiceTierEnabled = false;
function getResponsesSubpath(endpointPath: unknown): string | null {
const normalizedEndpoint = String(endpointPath || "").replace(/\/+$/, "");
const match = normalizedEndpoint.match(/(?:^|\/)responses(?:(\/.*))?$/i);
if (!match) return null;
return match[1] || "";
}
function isCompactResponsesEndpoint(endpointPath: unknown): boolean {
return getResponsesSubpath(endpointPath)?.toLowerCase() === "/compact";
}
function normalizeServiceTierValue(value: unknown): string | undefined {
if (typeof value !== "string") return undefined;
const normalized = value.trim().toLowerCase();
@@ -60,13 +71,31 @@ export class CodexExecutor extends BaseExecutor {
super("codex", PROVIDERS.codex);
}
buildUrl(model, stream, urlIndex = 0, credentials = null) {
void model;
void stream;
void urlIndex;
const responsesSubpath = getResponsesSubpath(credentials?.requestEndpointPath);
if (responsesSubpath !== null) {
const baseUrl = String(this.config.baseUrl || "").replace(/\/$/, "");
if (baseUrl.endsWith("/responses")) {
return `${baseUrl}${responsesSubpath}`;
}
return `${baseUrl}/responses${responsesSubpath}`;
}
return super.buildUrl(model, stream, urlIndex, credentials);
}
/**
* Codex Responses endpoint is SSE-first.
* Always request event-stream from upstream, even when client requested stream=false.
* Includes chatgpt-account-id header for strict workspace binding.
*/
buildHeaders(credentials, stream = true) {
const headers = super.buildHeaders(credentials, true);
const isCompactRequest = isCompactResponsesEndpoint(credentials?.requestEndpointPath);
const headers = super.buildHeaders(credentials, isCompactRequest ? false : true);
// Add workspace binding header if workspaceId is persisted
const workspaceId = credentials?.providerSpecificData?.workspaceId;
@@ -107,9 +136,15 @@ export class CodexExecutor extends BaseExecutor {
*/
transformRequest(model, body, stream, credentials) {
const nativeCodexPassthrough = body?._nativeCodexPassthrough === true;
const isCompactRequest = isCompactResponsesEndpoint(credentials?.requestEndpointPath);
// Codex /responses rejects stream=false; we aggregate SSE back to JSON when needed.
body.stream = true;
// Codex /responses rejects stream=false, but /responses/compact rejects the stream field entirely.
if (isCompactRequest) {
delete body.stream;
delete body.stream_options;
} else {
body.stream = true;
}
delete body._nativeCodexPassthrough;
const requestServiceTier = normalizeServiceTierValue(body.service_tier);
+2
View File
@@ -54,6 +54,8 @@ export class DefaultExecutor extends BaseExecutor {
break;
case "glm":
case "kimi-coding":
case "bailian-coding-plan":
case "kimi-coding-apikey":
case "minimax":
case "minimax-cn":
headers["x-api-key"] = credentials.apiKey || credentials.accessToken;
+5 -2
View File
@@ -77,10 +77,13 @@ export class KiroExecutor extends BaseExecutor {
}
transformRequest(model: string, body: unknown, stream: boolean, credentials: unknown): unknown {
void model;
void stream;
void credentials;
return body;
// Kiro uses conversationState.currentMessage.userInputMessage.modelId,
// not a top-level "model" field. chatCore injects translatedBody.model
// which Kiro API rejects as unknown top-level field.
const { model: _model, ...rest } = body as Record<string, unknown>;
return rest;
}
/**
+16 -4
View File
@@ -381,7 +381,12 @@ async function handleTortoiseSpeech(providerConfig, body) {
* @returns {Response}
*/
/** @returns {Promise<unknown>} */
export async function handleAudioSpeech({ body, credentials }) {
export async function handleAudioSpeech({
body,
credentials,
resolvedProvider = null,
resolvedModel = null,
}) {
if (!body.model) {
return errorResponse(400, "model is required");
}
@@ -389,8 +394,15 @@ export async function handleAudioSpeech({ body, credentials }) {
return errorResponse(400, "input is required");
}
const { provider: providerId, model: modelId } = parseSpeechModel(body.model);
const providerConfig = providerId ? getSpeechProvider(providerId) : null;
// Use pre-resolved provider/model from route handler if available (supports dynamic provider_nodes).
// Falls back to hardcoded registry lookup for backward compatibility.
let providerConfig = resolvedProvider;
let modelId = resolvedModel;
if (!providerConfig) {
const parsed = parseSpeechModel(body.model);
providerConfig = parsed.provider ? getSpeechProvider(parsed.provider) : null;
modelId = parsed.model;
}
if (!providerConfig) {
return errorResponse(
@@ -403,7 +415,7 @@ export async function handleAudioSpeech({ body, credentials }) {
const token =
providerConfig.authType === "none" ? null : credentials?.apiKey || credentials?.accessToken;
if (providerConfig.authType !== "none" && !token) {
return errorResponse(401, `No credentials for speech provider: ${providerId}`);
return errorResponse(401, `No credentials for speech provider: ${providerConfig.id}`);
}
try {
+18 -4
View File
@@ -13,7 +13,11 @@ import { getCorsOrigin } from "../utils/cors.ts";
* - HuggingFace Inference: POST raw binary to /models/{model_id}
*/
import { getTranscriptionProvider, parseTranscriptionModel } from "../config/audioRegistry.ts";
import {
getTranscriptionProvider,
parseTranscriptionModel,
type AudioProvider,
} from "../config/audioRegistry.ts";
import { buildAuthHeaders } from "../config/registryUtils.ts";
import { errorResponse } from "../utils/error.ts";
@@ -235,9 +239,13 @@ async function handleHuggingFaceTranscription(providerConfig, file, modelId, tok
export async function handleAudioTranscription({
formData,
credentials,
resolvedProvider = null,
resolvedModel = null,
}: {
formData: FormData;
credentials?: TranscriptionCredentials | null;
resolvedProvider?: AudioProvider | null;
resolvedModel?: string | null;
}): Promise<Response> {
const model = formData.get("model");
if (typeof model !== "string" || !model) {
@@ -250,8 +258,14 @@ export async function handleAudioTranscription({
}
const file = fileEntry as Blob & { name?: unknown };
const { provider: providerId, model: modelId } = parseTranscriptionModel(model);
const providerConfig = providerId ? getTranscriptionProvider(providerId) : null;
// Use pre-resolved provider/model from route handler if available (supports dynamic provider_nodes).
let providerConfig = resolvedProvider;
let modelId = resolvedModel;
if (!providerConfig) {
const parsed = parseTranscriptionModel(model);
providerConfig = parsed.provider ? getTranscriptionProvider(parsed.provider) : null;
modelId = parsed.model;
}
if (!providerConfig) {
return errorResponse(
@@ -264,7 +278,7 @@ export async function handleAudioTranscription({
const token =
providerConfig.authType === "none" ? null : credentials?.apiKey || credentials?.accessToken;
if (providerConfig.authType !== "none" && !token) {
return errorResponse(401, `No credentials for transcription provider: ${providerId}`);
return errorResponse(401, `No credentials for transcription provider: ${providerConfig.id}`);
}
// Route to provider-specific handler
+230 -40
View File
@@ -23,6 +23,7 @@ import {
appendRequestLog,
saveCallLog,
} from "@/lib/usageDb";
import { getModelNormalizeToolCallId } from "@/lib/db/models";
import { getExecutor } from "../executors/index.ts";
import { translateNonStreamingResponse } from "./responseTranslator.ts";
import { extractUsageFromResponse } from "./usageExtractor.ts";
@@ -42,6 +43,12 @@ import {
import { getIdempotencyKey, checkIdempotency, saveIdempotency } from "@/lib/idempotencyLayer";
import { createProgressTransform, wantsProgress } from "../utils/progressTracker.ts";
import { isModelUnavailableError, getNextFamilyFallback } from "../services/modelFamilyFallback.ts";
import { computeRequestHash, deduplicate, shouldDeduplicate } from "../services/requestDedup.ts";
import {
shouldUseFallback,
isFallbackDecision,
EMERGENCY_FALLBACK_CONFIG,
} from "../services/emergencyFallback.ts";
export function shouldUseNativeCodexPassthrough({
provider,
@@ -54,9 +61,8 @@ export function shouldUseNativeCodexPassthrough({
}): boolean {
if (provider !== "codex") return false;
if (sourceFormat !== FORMATS.OPENAI_RESPONSES) return false;
return String(endpointPath || "")
.toLowerCase()
.endsWith("/responses");
const normalizedEndpoint = String(endpointPath || "").replace(/\/+$/, "");
return /(?:^|\/)responses(?:\/.*)?$/i.test(normalizedEndpoint);
}
/**
@@ -89,6 +95,22 @@ export async function handleChatCore({
}) {
const { provider, model, extendedContext } = modelInfo;
const startTime = Date.now();
const persistFailureUsage = (statusCode: number, errorCode?: string | null) => {
saveRequestUsage({
provider: provider || "unknown",
model: model || "unknown",
tokens: { input: 0, output: 0, cacheRead: 0, cacheCreation: 0, reasoning: 0 },
status: String(statusCode),
success: false,
latencyMs: Date.now() - startTime,
timeToFirstTokenMs: 0,
errorCode: errorCode || String(statusCode),
timestamp: new Date().toISOString(),
connectionId: connectionId || undefined,
apiKeyId: apiKeyInfo?.id || undefined,
apiKeyName: apiKeyInfo?.name || undefined,
}).catch(() => {});
};
// ── Phase 9.2: Idempotency check ──
const idempotencyKey = getIdempotencyKey(clientRawRequest?.headers);
@@ -118,8 +140,8 @@ export async function handleChatCore({
}
const sourceFormat = detectFormat(body);
const endpointPath = (clientRawRequest?.endpoint || "").toLowerCase();
const isResponsesEndpoint = endpointPath.endsWith("/responses");
const endpointPath = String(clientRawRequest?.endpoint || "");
const isResponsesEndpoint = /(?:^|\/)responses(?:\/.*)?$/i.test(endpointPath);
const nativeCodexPassthrough = shouldUseNativeCodexPassthrough({
provider,
sourceFormat,
@@ -135,10 +157,16 @@ export async function handleChatCore({
// Detect source format and get target format
// Model-specific targetFormat takes priority over provider default
// Apply custom model aliases (Settings → Model Aliases → Pattern→Target) before routing (#315)
// Apply custom model aliases (Settings → Model Aliases → Pattern→Target) before routing (#315, #472)
// Custom aliases take priority over built-in and must be resolved here so the
// downstream getModelTargetFormat() lookup uses the correct, aliased model ID.
// downstream getModelTargetFormat() lookup AND the actual provider request use
// the correct, aliased model ID. Without this, aliases only affect format detection.
const resolvedModel = resolveModelAlias(model);
// Use resolvedModel for all downstream operations (routing, provider requests, logging)
const effectiveModel = resolvedModel !== model ? resolvedModel : model;
if (resolvedModel !== model) {
log?.info?.("ALIAS", `Model alias applied: ${model}${resolvedModel}`);
}
const alias = PROVIDER_ID_TO_ALIAS[provider] || provider;
const modelTargetFormat = getModelTargetFormat(alias, resolvedModel);
@@ -185,10 +213,17 @@ export async function handleChatCore({
// Translate request (pass reqLogger for intermediate logging)
let translatedBody = body;
const isClaudePassthrough = sourceFormat === FORMATS.CLAUDE && targetFormat === FORMATS.CLAUDE;
try {
if (nativeCodexPassthrough) {
translatedBody = { ...body, _nativeCodexPassthrough: true };
log?.debug?.("FORMAT", "native codex passthrough enabled");
} else if (isClaudePassthrough) {
// Claude-to-Claude passthrough: forward body completely untouched.
// No translation, no field stripping, no thinking normalization.
// We are just a gateway -- do not interfere with the request in the slightest.
translatedBody = { ...body };
log?.debug?.("FORMAT", "claude->claude passthrough -- forwarding untouched");
} else {
translatedBody = { ...body };
@@ -233,6 +268,56 @@ export async function handleChatCore({
});
}
// Strip empty text content blocks from messages.
// Anthropic API rejects {"type":"text","text":""} with 400 "text content blocks must be non-empty".
// Some clients (LiteLLM passthrough, @ai-sdk/anthropic) may forward these empty blocks as-is.
if (Array.isArray(translatedBody.messages)) {
for (const msg of translatedBody.messages) {
if (Array.isArray(msg.content)) {
msg.content = msg.content.filter(
(block: Record<string, unknown>) =>
block.type !== "text" || (typeof block.text === "string" && block.text.length > 0)
);
}
}
}
// ── #409: Normalize unsupported content part types ──
// Cursor and other clients send {type:"file"} when attaching .md or other files.
// Providers (Copilot, OpenAI) only accept "text" and "image_url" in content arrays.
// Convert: file → text (extract content), drop unrecognized types with a warning.
if (Array.isArray(translatedBody.messages)) {
for (const msg of translatedBody.messages) {
if (msg.role === "user" && Array.isArray(msg.content)) {
msg.content = (msg.content as Record<string, unknown>[]).flatMap(
(block: Record<string, unknown>) => {
if (block.type === "text" || block.type === "image_url" || block.type === "image") {
return [block];
}
// file / document → extract text content
if (block.type === "file" || block.type === "document") {
const fileContent =
(block.file as Record<string, unknown>)?.content ??
(block.file as Record<string, unknown>)?.text ??
block.content ??
block.text;
const fileName =
(block.file as Record<string, unknown>)?.name ?? block.name ?? "attachment";
if (typeof fileContent === "string" && fileContent.length > 0) {
return [{ type: "text", text: `[${fileName}]\n${fileContent}` }];
}
return [];
}
// Unknown types: drop silently
log?.debug?.("CONTENT", `Dropped unsupported content part type="${block.type}"`);
return [];
}
);
}
}
}
const normalizeToolCallId = getModelNormalizeToolCallId(provider || "", model || "");
translatedBody = translateRequest(
sourceFormat,
targetFormat,
@@ -241,7 +326,8 @@ export async function handleChatCore({
stream,
credentials,
provider,
reqLogger
reqLogger,
{ normalizeToolCallId }
);
}
} catch (error) {
@@ -287,8 +373,8 @@ export async function handleChatCore({
delete translatedBody._toolNameMap;
delete translatedBody._disableToolPrefix;
// Update model in body
translatedBody.model = model;
// Update model in body — use resolved alias so the provider gets the correct model ID (#472)
translatedBody.model = effectiveModel;
// Strip unsupported parameters for reasoning models (o1, o3, etc.)
const unsupported = getUnsupportedParams(provider, model);
@@ -307,13 +393,66 @@ export async function handleChatCore({
// Get executor for this provider
const executor = getExecutor(provider);
const getExecutionCredentials = () =>
nativeCodexPassthrough ? { ...credentials, requestEndpointPath: endpointPath } : credentials;
// Create stream controller for disconnect detection
const streamController = createStreamController({ onDisconnect, log, provider, model });
const dedupRequestBody = { ...translatedBody, model: `${provider}/${model}` };
const dedupEnabled = shouldDeduplicate(dedupRequestBody);
const dedupHash = dedupEnabled ? computeRequestHash(dedupRequestBody) : null;
const executeProviderRequest = async (modelToCall = effectiveModel, allowDedup = false) => {
const execute = async () => {
const bodyToSend =
translatedBody.model === modelToCall
? translatedBody
: { ...translatedBody, model: modelToCall };
const rawResult = await withRateLimit(provider, connectionId, modelToCall, () =>
executor.execute({
model: modelToCall,
body: bodyToSend,
stream,
credentials: getExecutionCredentials(),
signal: streamController.signal,
log,
extendedContext,
})
);
if (stream) return rawResult;
// Non-stream responses need cloning for shared dedup consumers.
const status = rawResult.response.status;
const statusText = rawResult.response.statusText;
const headers = Array.from(rawResult.response.headers.entries());
const payload = await rawResult.response.text();
return {
...rawResult,
response: new Response(payload, { status, statusText, headers }),
};
};
if (allowDedup && dedupEnabled && dedupHash) {
const dedupResult = await deduplicate(dedupHash, execute);
if (dedupResult.wasDeduplicated) {
log?.debug?.("DEDUP", `Joined in-flight request hash=${dedupHash}`);
}
return dedupResult.result;
}
return execute();
};
// Track pending request
trackPendingRequest(model, provider, connectionId, true);
// T5: track which models we've tried for intra-family fallback
const triedModels = new Set<string>([model]);
let currentModel = model;
const triedModels = new Set<string>([effectiveModel]);
let currentModel = effectiveModel;
// Log start
appendRequestLog({ model, provider, connectionId, status: "PENDING" }).catch(() => {});
@@ -325,9 +464,6 @@ export async function handleChatCore({
0;
log?.debug?.("REQUEST", `${provider.toUpperCase()} | ${model} | ${msgCount} msgs`);
// Create stream controller for disconnect detection
const streamController = createStreamController({ onDisconnect, log, provider, model });
// Execute request using executor (handles URL building, headers, fallback, transform)
let providerResponse;
let providerUrl;
@@ -335,17 +471,7 @@ export async function handleChatCore({
let finalBody;
try {
const result = await withRateLimit(provider, connectionId, model, () =>
executor.execute({
model,
body: translatedBody,
stream,
credentials,
signal: streamController.signal,
log,
extendedContext,
})
);
const result = await executeProviderRequest(effectiveModel, true);
providerResponse = result.response;
providerUrl = result.url;
@@ -392,6 +518,7 @@ export async function handleChatCore({
streamController.handleError(error);
return createErrorResult(499, "Request aborted");
}
persistFailureUsage(HTTP_STATUS.BAD_GATEWAY, error?.name || "upstream_error");
const errMsg = formatProviderError(error, provider, model, HTTP_STATUS.BAD_GATEWAY);
console.log(`${COLORS.red}[ERROR] ${errMsg}${COLORS.reset}`);
return createErrorResult(HTTP_STATUS.BAD_GATEWAY, errMsg);
@@ -428,7 +555,7 @@ export async function handleChatCore({
model,
body: translatedBody,
stream,
credentials,
credentials: getExecutionCredentials(),
signal: streamController.signal,
log,
extendedContext,
@@ -501,17 +628,7 @@ export async function handleChatCore({
log?.info?.("MODEL_FALLBACK", `${model} unavailable (${statusCode}) → trying ${nextModel}`);
// Re-execute with the fallback model
try {
const fallbackResult = await withRateLimit(provider, connectionId, nextModel, () =>
executor.execute({
model: nextModel,
body: translatedBody,
stream,
credentials,
signal: streamController.signal,
log,
extendedContext,
})
);
const fallbackResult = await executeProviderRequest(nextModel, false);
if (fallbackResult.response.ok) {
providerResponse = fallbackResult.response;
providerUrl = fallbackResult.url;
@@ -523,18 +640,79 @@ export async function handleChatCore({
// We fall through by NOT returning here
} else {
// Fallback also failed — return original error
persistFailureUsage(statusCode, "model_unavailable");
return createErrorResult(statusCode, errMsg, retryAfterMs);
}
} catch {
persistFailureUsage(statusCode, "model_unavailable");
return createErrorResult(statusCode, errMsg, retryAfterMs);
}
} else {
persistFailureUsage(statusCode, "model_unavailable");
return createErrorResult(statusCode, errMsg, retryAfterMs);
}
} else {
persistFailureUsage(statusCode, `upstream_${statusCode}`);
return createErrorResult(statusCode, errMsg, retryAfterMs);
}
// ── End T5 ───────────────────────────────────────────────────────────────
// ── Emergency Fallback (ClawRouter Feature #09/017) ────────────────────
// When a non-streaming request fails with a budget-related error (402 or
// budget keywords), redirect to nvidia/gpt-oss-120b ($0.00/M) before
// returning the error to the combo router. This gives one last free-tier
// attempt so the user's session stays alive.
const requestHasTools = Array.isArray(translatedBody.tools) && translatedBody.tools.length > 0;
if (!stream) {
const fbDecision = shouldUseFallback(
statusCode,
message,
requestHasTools,
EMERGENCY_FALLBACK_CONFIG
);
if (isFallbackDecision(fbDecision)) {
log?.info?.("EMERGENCY_FALLBACK", fbDecision.reason);
try {
// Build a minimal fallback request using the original body but with
// the NVIDIA free-tier model and max_tokens capped to avoid overuse.
const fbExecutor = getExecutor(fbDecision.provider);
const fbResult = await fbExecutor.execute({
model: fbDecision.model,
body: {
...translatedBody,
model: fbDecision.model,
max_tokens: Math.min(
typeof translatedBody.max_tokens === "number"
? translatedBody.max_tokens
: fbDecision.maxOutputTokens,
fbDecision.maxOutputTokens
),
},
stream: false,
credentials: credentials,
signal: streamController.signal,
log,
extendedContext,
});
if (fbResult.response.ok) {
providerResponse = fbResult.response;
log?.info?.(
"EMERGENCY_FALLBACK",
`Serving ${fbDecision.provider}/${fbDecision.model} as budget fallback for ${provider}/${model}`
);
// Fall through to non-streaming handler — providerResponse is now OK
} else {
log?.warn?.(
"EMERGENCY_FALLBACK",
`Emergency fallback also failed (${fbResult.response.status})`
);
}
} catch (fbErr) {
log?.warn?.("EMERGENCY_FALLBACK", `Emergency fallback error: ${fbErr?.message}`);
}
}
}
// ── End Emergency Fallback ────────────────────────────────────────────
}
// Non-streaming response
@@ -560,6 +738,7 @@ export async function handleChatCore({
connectionId,
status: `FAILED ${HTTP_STATUS.BAD_GATEWAY}`,
}).catch(() => {});
persistFailureUsage(HTTP_STATUS.BAD_GATEWAY, "invalid_sse_payload");
return createErrorResult(
HTTP_STATUS.BAD_GATEWAY,
"Invalid SSE response for non-streaming request"
@@ -577,6 +756,7 @@ export async function handleChatCore({
connectionId,
status: `FAILED ${HTTP_STATUS.BAD_GATEWAY}`,
}).catch(() => {});
persistFailureUsage(HTTP_STATUS.BAD_GATEWAY, "invalid_json_payload");
return createErrorResult(HTTP_STATUS.BAD_GATEWAY, "Invalid JSON response from provider");
}
}
@@ -619,6 +799,11 @@ export async function handleChatCore({
provider: provider || "unknown",
model: model || "unknown",
tokens: usage,
status: "200",
success: true,
latencyMs: Date.now() - startTime,
timeToFirstTokenMs: Date.now() - startTime,
errorCode: null,
timestamp: new Date().toISOString(),
connectionId: connectionId || undefined,
apiKeyId: apiKeyInfo?.id || undefined,
@@ -695,8 +880,12 @@ export async function handleChatCore({
// Create transform stream with logger for streaming response
let transformStream;
// Callback to save call log when stream completes (streaming calls were never logged before!)
const onStreamComplete = ({ status: streamStatus, usage: streamUsage }) => {
// Callback to save call log when stream completes (include responseBody when provided by stream)
const onStreamComplete = ({
status: streamStatus,
usage: streamUsage,
responseBody: streamResponseBody,
}) => {
saveCallLog({
method: "POST",
path: clientRawRequest?.endpoint || "/v1/chat/completions",
@@ -707,6 +896,7 @@ export async function handleChatCore({
duration: Date.now() - startTime,
tokens: streamUsage || {},
requestBody: body,
responseBody: streamResponseBody ?? undefined,
sourceFormat,
targetFormat,
comboName,
+47 -14
View File
@@ -13,18 +13,48 @@
* }
*/
import { getEmbeddingProvider, parseEmbeddingModel } from "../config/embeddingRegistry.ts";
import {
getEmbeddingProvider,
parseEmbeddingModel,
type EmbeddingProvider,
} from "../config/embeddingRegistry.ts";
import { saveCallLog } from "@/lib/usageDb";
/**
* Handle embedding request
* @param {object} options
* @param {object} options.body - Request body
* @param {object} options.credentials - Provider credentials { apiKey, accessToken }
* @param {object} options.log - Logger
* Handle embedding request.
* Supports both hardcoded cloud providers and dynamic local provider_nodes.
* When resolvedProvider is passed, uses it directly (injection pattern from route handler).
* Falls back to hardcoded registry lookup for backward compatibility.
*/
export async function handleEmbedding({ body, credentials, log }) {
const { provider, model } = parseEmbeddingModel(body.model);
export async function handleEmbedding({
body,
credentials,
log,
resolvedProvider = null,
resolvedModel = null,
}: {
body: Record<string, unknown>;
credentials: { apiKey?: string; accessToken?: string } | null;
log?: { info: (...args: unknown[]) => void; error: (...args: unknown[]) => void };
resolvedProvider?: EmbeddingProvider | null;
resolvedModel?: string | null;
}) {
// Use pre-resolved provider/model from route handler if available (supports dynamic provider_nodes).
let provider: string | null;
let model: string | null;
let providerConfig: EmbeddingProvider | null;
if (resolvedProvider) {
provider = resolvedProvider.id;
model = resolvedModel;
providerConfig = resolvedProvider;
} else {
const parsed = parseEmbeddingModel(body.model as string);
provider = parsed.provider;
model = parsed.model;
providerConfig = provider ? getEmbeddingProvider(provider) : null;
}
const startTime = Date.now();
// Summarized request body for call log (avoid storing large embedding input arrays)
@@ -42,7 +72,6 @@ export async function handleEmbedding({ body, credentials, log }) {
};
}
const providerConfig = getEmbeddingProvider(provider);
if (!providerConfig) {
return {
success: false,
@@ -66,11 +95,15 @@ export async function handleEmbedding({ body, credentials, log }) {
"Content-Type": "application/json",
};
const token = credentials.apiKey || credentials.accessToken;
if (providerConfig.authHeader === "bearer") {
headers["Authorization"] = `Bearer ${token}`;
} else if (providerConfig.authHeader === "x-api-key") {
headers["x-api-key"] = token;
// Skip credential injection for local providers (authType: "none")
const token =
providerConfig.authType === "none" ? null : credentials?.apiKey || credentials?.accessToken;
if (token) {
if (providerConfig.authHeader === "bearer") {
headers["Authorization"] = `Bearer ${token}`;
} else if (providerConfig.authHeader === "x-api-key") {
headers["x-api-key"] = token;
}
}
if (log) {
+680
View File
@@ -0,0 +1,680 @@
/**
* Search Handler
*
* Handles POST /v1/search requests.
* Routes to 5 search providers with automatic failover:
* serper-search, brave-search, perplexity-search, exa-search, tavily-search
*
* Request format:
* {
* "query": "search query",
* "provider": "serper-search" | "brave-search" | ... // optional, auto-selects cheapest
* "max_results": 5,
* "search_type": "web" | "news"
* }
*/
import { getSearchProvider, type SearchProviderConfig } from "../config/searchRegistry.ts";
import { saveCallLog } from "@/lib/usageDb";
// ── Types ────────────────────────────────────────────────────────────────
export interface SearchResult {
title: string;
url: string;
display_url?: string;
snippet: string;
position: number;
score: number | null;
published_at: string | null;
favicon_url: string | null;
content: { format: string; text: string; length: number } | null;
metadata: {
author: string | null;
language: string | null;
source_type: string | null;
image_url: string | null;
} | null;
citation: {
provider: string;
retrieved_at: string;
rank: number;
};
provider_raw: Record<string, unknown> | null;
}
export interface SearchResponse {
provider: string;
query: string;
results: SearchResult[];
answer: { source: string; text: string | null; model: string | null } | null;
usage: { queries_used: number; search_cost_usd: number; llm_tokens?: number };
metrics: {
response_time_ms: number;
upstream_latency_ms: number;
gateway_latency_ms?: number;
total_results_available: number | null;
};
errors: Array<{ provider: string; code: string; message: string }>;
}
interface SearchHandlerResult {
success: boolean;
status?: number;
error?: string;
data?: SearchResponse;
}
interface SearchHandlerOptions {
query: string;
provider: string;
maxResults: number;
searchType: string;
country?: string;
language?: string;
timeRange?: string;
offset?: number;
domainFilter?: string[];
contentOptions?: {
snippet?: boolean;
full_page?: boolean;
format?: string;
max_characters?: number;
};
strictFilters?: boolean;
providerOptions?: Record<string, unknown>;
credentials: Record<string, any>;
alternateProvider?: string;
alternateCredentials?: Record<string, any> | null;
log?: any;
}
// ── Constants ────────────────────────────────────────────────────────────
const GLOBAL_TIMEOUT_MS = 15_000;
// Non-retriable HTTP status codes — fail immediately, don't try alternate
const NON_RETRIABLE = new Set([400, 401, 403, 404]);
// ── Input Sanitization ──────────────────────────────────────────────────
// Control characters that should never appear in search queries
const CONTROL_CHAR_RE = /[\x00-\x08\x0B\x0C\x0E-\x1F\x7F]/;
function sanitizeQuery(query: string): { clean: string; error?: string } {
if (CONTROL_CHAR_RE.test(query)) {
return { clean: "", error: "Query contains invalid control characters" };
}
const clean = query.normalize("NFKC").trim().replace(/\s+/g, " ");
if (clean.length === 0) {
return { clean: "", error: "Query is empty after normalization" };
}
return { clean };
}
// ── Response Normalizers ────────────────────────────────────────────────
function makeResult(
providerId: string,
item: {
title?: string;
url?: string;
snippet?: string;
score?: number;
published_at?: string;
favicon_url?: string;
author?: string;
source_type?: string;
image_url?: string;
full_text?: string;
text_format?: string;
},
idx: number,
now: string
): SearchResult {
const url = item.url || "";
return {
title: item.title || "",
url,
display_url: url ? url.replace(/^https?:\/\/(www\.)?/, "").split("?")[0] : undefined,
snippet: item.snippet || "",
position: idx + 1,
score: typeof item.score === "number" ? Math.min(1, Math.max(0, item.score)) : null,
published_at: item.published_at || null,
favicon_url: item.favicon_url || null,
content: item.full_text
? { format: item.text_format || "text", text: item.full_text, length: item.full_text.length }
: null,
metadata: {
author: item.author || null,
language: null,
source_type: item.source_type || null,
image_url: item.image_url || null,
},
citation: { provider: providerId, retrieved_at: now, rank: idx + 1 },
provider_raw: null,
};
}
function normalizeSerperResponse(
data: any,
_query: string,
searchType: string
): { results: SearchResult[]; totalResults: number | null } {
const now = new Date().toISOString();
const items = searchType === "news" ? data.news : data.organic;
if (!Array.isArray(items)) return { results: [], totalResults: null };
const results = items.map((item: any, idx: number) =>
makeResult(
"serper-search",
{
title: item.title,
url: item.link,
snippet: item.snippet || item.description,
published_at: item.date,
},
idx,
now
)
);
return {
results,
totalResults:
typeof data.searchParameters?.totalResults === "number"
? data.searchParameters.totalResults
: null,
};
}
function normalizeBraveResponse(
data: any,
_query: string,
searchType: string
): { results: SearchResult[]; totalResults: number | null } {
const now = new Date().toISOString();
// Brave news endpoint returns { results: [...] } directly,
// while web endpoint returns { web: { results: [...] } }
const container = searchType === "news" ? data.news || data : data.web;
const items = container?.results;
if (!Array.isArray(items)) return { results: [], totalResults: null };
const results = items.map((item: any, idx: number) =>
makeResult(
"brave-search",
{
title: item.title,
url: item.url,
snippet: item.description,
published_at: item.page_age || item.age,
favicon_url: item.meta_url?.favicon || item.favicon,
},
idx,
now
)
);
return { results, totalResults: container?.totalCount ?? null };
}
// ── Helpers ─────────────────────────────────────────────────────────────
function parseDomainFilter(domainFilter?: string[]): {
includes: string[];
excludes: string[];
} {
if (!domainFilter?.length) return { includes: [], excludes: [] };
const includes = domainFilter.filter((d) => !d.startsWith("-"));
const excludes = domainFilter.filter((d) => d.startsWith("-")).map((d) => d.slice(1));
return { includes, excludes };
}
// ── Provider Request Builders ───────────────────────────────────────────
interface SearchRequestParams {
query: string;
searchType: string;
maxResults: number;
token: string;
country?: string;
language?: string;
domainFilter?: string[];
}
function buildSerperRequest(
config: SearchProviderConfig,
params: SearchRequestParams
): { url: string; init: RequestInit } {
const endpoint = params.searchType === "news" ? "/news" : "/search";
const body: Record<string, unknown> = { q: params.query, num: params.maxResults };
if (params.country) body.gl = params.country.toLowerCase();
if (params.language) body.hl = params.language;
return {
url: `${config.baseUrl}${endpoint}`,
init: {
method: "POST",
headers: { "Content-Type": "application/json", "X-API-Key": params.token },
body: JSON.stringify(body),
},
};
}
function buildBraveRequest(
config: SearchProviderConfig,
params: SearchRequestParams
): { url: string; init: RequestInit } {
const endpoint = params.searchType === "news" ? "/news/search" : "/web/search";
const qp = new URLSearchParams({ q: params.query, count: String(params.maxResults) });
if (params.country) qp.set("country", params.country);
if (params.language) qp.set("search_lang", params.language);
return {
url: `${config.baseUrl}${endpoint}?${qp}`,
init: {
method: "GET",
headers: { Accept: "application/json", "X-Subscription-Token": params.token },
},
};
}
function buildPerplexityRequest(
config: SearchProviderConfig,
params: SearchRequestParams
): { url: string; init: RequestInit } {
const body: Record<string, unknown> = { query: params.query, max_results: params.maxResults };
if (params.country) body.country = params.country;
if (params.language) body.search_language_filter = [params.language];
if (params.domainFilter?.length) body.search_domain_filter = params.domainFilter;
return {
url: config.baseUrl,
init: {
method: "POST",
headers: { "Content-Type": "application/json", Authorization: `Bearer ${params.token}` },
body: JSON.stringify(body),
},
};
}
function buildExaRequest(
config: SearchProviderConfig,
params: SearchRequestParams
): { url: string; init: RequestInit } {
const { includes, excludes } = parseDomainFilter(params.domainFilter);
const body: Record<string, unknown> = {
query: params.query,
numResults: params.maxResults,
type: "auto",
text: true,
highlights: true,
};
if (includes.length) body.includeDomains = includes;
if (excludes.length) body.excludeDomains = excludes;
if (params.searchType === "news") body.category = "news";
return {
url: config.baseUrl,
init: {
method: "POST",
headers: { "Content-Type": "application/json", "x-api-key": params.token },
body: JSON.stringify(body),
},
};
}
function buildTavilyRequest(
config: SearchProviderConfig,
params: SearchRequestParams
): { url: string; init: RequestInit } {
const { includes, excludes } = parseDomainFilter(params.domainFilter);
const body: Record<string, unknown> = {
query: params.query,
max_results: params.maxResults,
topic: params.searchType === "news" ? "news" : "general",
};
if (includes.length) body.include_domains = includes;
if (excludes.length) body.exclude_domains = excludes;
if (params.country) body.country = params.country;
return {
url: config.baseUrl,
init: {
method: "POST",
headers: { "Content-Type": "application/json", Authorization: `Bearer ${params.token}` },
body: JSON.stringify(body),
},
};
}
function buildRequest(
config: SearchProviderConfig,
params: SearchRequestParams
): { url: string; init: RequestInit } {
if (config.id === "serper-search") return buildSerperRequest(config, params);
if (config.id === "brave-search") return buildBraveRequest(config, params);
if (config.id === "perplexity-search") return buildPerplexityRequest(config, params);
if (config.id === "exa-search") return buildExaRequest(config, params);
if (config.id === "tavily-search") return buildTavilyRequest(config, params);
// Fallback for future providers: POST with bearer auth
return {
url: config.baseUrl,
init: {
method: config.method,
headers: { "Content-Type": "application/json", Authorization: `Bearer ${params.token}` },
body: JSON.stringify({
query: params.query,
max_results: params.maxResults,
search_type: params.searchType,
}),
},
};
}
function normalizePerplexityResponse(
data: any,
_query: string,
_searchType: string
): { results: SearchResult[]; totalResults: number | null } {
const now = new Date().toISOString();
const items = data.results;
if (!Array.isArray(items)) return { results: [], totalResults: null };
const results = items.map((item: any, idx: number) =>
makeResult(
"perplexity-search",
{
title: item.title,
url: item.url,
snippet: item.snippet,
published_at: item.date || item.last_updated,
},
idx,
now
)
);
return { results, totalResults: results.length };
}
function normalizeExaResponse(
data: any,
_query: string,
_searchType: string
): { results: SearchResult[]; totalResults: number | null } {
const now = new Date().toISOString();
const items = data.results;
if (!Array.isArray(items)) return { results: [], totalResults: null };
const results = items.map((item: any, idx: number) =>
makeResult(
"exa-search",
{
title: item.title,
url: item.url,
snippet: item.highlights?.[0] || item.text?.slice(0, 300) || "",
score: item.score,
published_at: item.publishedDate,
favicon_url: item.favicon,
author: item.author,
image_url: item.image,
full_text: item.text,
text_format: "text",
},
idx,
now
)
);
return { results, totalResults: results.length };
}
function normalizeTavilyResponse(
data: any,
_query: string,
_searchType: string
): { results: SearchResult[]; totalResults: number | null } {
const now = new Date().toISOString();
const items = data.results;
if (!Array.isArray(items)) return { results: [], totalResults: null };
const results = items.map((item: any, idx: number) =>
makeResult(
"tavily-search",
{
title: item.title,
url: item.url,
snippet: item.content || "",
score: item.score,
published_at: item.published_date,
full_text: item.raw_content,
text_format: "text",
},
idx,
now
)
);
return { results, totalResults: results.length };
}
function normalizeResponse(
providerId: string,
data: any,
query: string,
searchType: string
): { results: SearchResult[]; totalResults: number | null } {
if (providerId === "serper-search") return normalizeSerperResponse(data, query, searchType);
if (providerId === "brave-search") return normalizeBraveResponse(data, query, searchType);
if (providerId === "perplexity-search")
return normalizePerplexityResponse(data, query, searchType);
if (providerId === "exa-search") return normalizeExaResponse(data, query, searchType);
if (providerId === "tavily-search") return normalizeTavilyResponse(data, query, searchType);
return { results: [], totalResults: null };
}
// ── Main Handler ────────────────────────────────────────────────────────
export async function handleSearch(options: SearchHandlerOptions): Promise<SearchHandlerResult> {
const {
query,
provider: providerId,
maxResults,
searchType,
country,
language,
domainFilter,
credentials,
alternateProvider,
alternateCredentials,
log,
} = options;
const startTime = Date.now();
// 1. Sanitize input
const { clean: cleanQuery, error: sanitizeError } = sanitizeQuery(query);
if (sanitizeError) {
return { success: false, status: 400, error: sanitizeError };
}
// 2. Use resolved provider from route (no re-resolution)
const primaryConfig = getSearchProvider(providerId);
if (!primaryConfig) {
return {
success: false,
status: 400,
error: `Unknown search provider: ${providerId}`,
};
}
// 3. Get alternate config for failover (pre-resolved by route)
const alternateConfig = alternateProvider ? getSearchProvider(alternateProvider) : null;
const requestParams = {
query: cleanQuery,
searchType,
maxResults,
country,
language,
domainFilter,
};
// 4. Try primary provider
const result = await tryProvider(primaryConfig, requestParams, credentials, startTime, log);
if (result.success) return result;
// 5. Failover to alternate (only for retriable errors and auto-select mode)
if (
alternateConfig &&
alternateCredentials &&
!NON_RETRIABLE.has(result.status || 0) &&
Date.now() - startTime < GLOBAL_TIMEOUT_MS
) {
if (log) {
log.warn(
"SEARCH",
`${primaryConfig.id} failed (${result.status}), trying ${alternateConfig.id}`
);
}
const fallbackResult = await tryProvider(
alternateConfig,
requestParams,
alternateCredentials,
startTime,
log
);
if (fallbackResult.success) return fallbackResult;
}
return result;
}
async function tryProvider(
config: SearchProviderConfig,
params: Omit<SearchRequestParams, "token">,
credentials: Record<string, any>,
globalStartTime: number,
log?: any
): Promise<SearchHandlerResult> {
const startTime = Date.now();
const token = credentials.apiKey || credentials.accessToken;
if (!token) {
return {
success: false,
status: 401,
error: `No credentials for search provider: ${config.id}`,
};
}
const { query, searchType, maxResults } = params;
const { url, init } = buildRequest(config, { ...params, token });
// Timeout: min of provider timeout and remaining global timeout
const remainingGlobal = GLOBAL_TIMEOUT_MS - (Date.now() - globalStartTime);
const timeout = Math.min(config.timeoutMs, Math.max(remainingGlobal, 1000));
const controller = new AbortController();
const timer = setTimeout(() => controller.abort(), timeout);
if (log) {
log.info("SEARCH", `${config.id} | query: "${query.slice(0, 80)}" | type: ${searchType}`);
}
try {
const response = await fetch(url, { ...init, signal: controller.signal });
clearTimeout(timer);
if (!response.ok) {
const errorText = await response.text();
if (log) {
log.error("SEARCH", `${config.id} error ${response.status}: ${errorText.slice(0, 200)}`);
}
saveCallLog({
method: config.method,
path: "/v1/search",
status: response.status,
model: config.id,
provider: config.id,
duration: Date.now() - startTime,
requestType: "search",
error: errorText.slice(0, 500),
requestBody: {
query: query.slice(0, 200),
search_type: searchType,
max_results: maxResults,
},
}).catch(() => {
/* non-critical — logging must not block search response */
});
return {
success: false,
status: response.status,
error: `Search provider ${config.id} returned ${response.status}`,
};
}
const data = await response.json();
const normalized = normalizeResponse(config.id, data, query, searchType);
// Enforce max_results — some providers return more than requested
const results = normalized.results.slice(0, maxResults);
const totalResults = normalized.totalResults;
const duration = Date.now() - startTime;
saveCallLog({
method: config.method,
path: "/v1/search",
status: 200,
model: config.id,
provider: config.id,
duration,
requestType: "search",
tokens: { prompt_tokens: 0, completion_tokens: 0 },
requestBody: { query: query.slice(0, 200), search_type: searchType, max_results: maxResults },
responseBody: { results_count: results.length, cached: false },
}).catch(() => {
/* non-critical — logging must not block search response */
});
return {
success: true,
data: {
provider: config.id,
query,
results,
answer: null,
usage: { queries_used: 1, search_cost_usd: config.costPerQuery },
metrics: {
response_time_ms: duration,
upstream_latency_ms: duration,
total_results_available: totalResults,
},
errors: [],
},
};
} catch (err: any) {
clearTimeout(timer);
const isTimeout = err.name === "AbortError";
if (log) {
log.error("SEARCH", `${config.id} ${isTimeout ? "timeout" : "fetch error"}: ${err.message}`);
}
saveCallLog({
method: config.method,
path: "/v1/search",
status: isTimeout ? 504 : 502,
model: config.id,
provider: config.id,
duration: Date.now() - startTime,
requestType: "search",
error: err.message,
requestBody: { query: query.slice(0, 200), search_type: searchType, max_results: maxResults },
}).catch(() => {
/* non-critical — logging must not block search response */
});
return {
success: false,
status: isTimeout ? 504 : 502,
error: `Search provider ${isTimeout ? "timeout" : "error"}: ${err.message}`,
};
}
}
@@ -0,0 +1,48 @@
import { describe, it, expect } from "vitest";
import {
MCP_TOOLS,
MCP_TOOL_MAP,
setRoutingStrategyInput,
setRoutingStrategyTool,
} from "../schemas/tools.ts";
describe("omniroute_set_routing_strategy MCP tool schema", () => {
it("should be registered in MCP_TOOLS", () => {
const tool = MCP_TOOLS.find((t) => t.name === "omniroute_set_routing_strategy");
expect(tool).toBeDefined();
expect(tool?.phase).toBe(2);
});
it("should be available in MCP_TOOL_MAP", () => {
expect(MCP_TOOL_MAP["omniroute_set_routing_strategy"]).toBeDefined();
});
it("should require write:combos scope", () => {
expect(setRoutingStrategyTool.scopes).toContain("write:combos");
});
it("should validate a standard strategy payload", () => {
const result = setRoutingStrategyInput.safeParse({
comboId: "my-combo",
strategy: "cost-optimized",
});
expect(result.success).toBe(true);
});
it("should validate auto strategy with autoRoutingStrategy", () => {
const result = setRoutingStrategyInput.safeParse({
comboId: "my-combo",
strategy: "auto",
autoRoutingStrategy: "latency",
});
expect(result.success).toBe(true);
});
it("should reject unknown strategy", () => {
const result = setRoutingStrategyInput.safeParse({
comboId: "my-combo",
strategy: "unknown-strategy",
});
expect(result.success).toBe(false);
});
});
+55 -7
View File
@@ -107,6 +107,7 @@ export const listCombosOutput = z.object({
"priority",
"weighted",
"round-robin",
"strict-random",
"random",
"least-used",
"cost-optimized",
@@ -470,7 +471,53 @@ export const setBudgetGuardTool: McpToolDefinition<
sourceEndpoints: ["/api/usage/budget"],
};
// --- Tool 11: omniroute_set_resilience_profile ---
// --- Tool 11: omniroute_set_routing_strategy ---
export const setRoutingStrategyInput = z.object({
comboId: z.string().describe("Combo ID or name to update"),
strategy: z
.enum([
"priority",
"weighted",
"round-robin",
"strict-random",
"random",
"least-used",
"cost-optimized",
"auto",
])
.describe("Routing strategy to apply"),
autoRoutingStrategy: z
.enum(["rules", "cost", "eco", "latency", "fast"])
.optional()
.describe("Optional strategy used by auto mode (only used when strategy='auto')"),
});
export const setRoutingStrategyOutput = z.object({
success: z.boolean(),
combo: z.object({
id: z.string(),
name: z.string(),
strategy: z.string(),
autoRoutingStrategy: z.string().nullable(),
}),
});
export const setRoutingStrategyTool: McpToolDefinition<
typeof setRoutingStrategyInput,
typeof setRoutingStrategyOutput
> = {
name: "omniroute_set_routing_strategy",
description:
"Updates a combo routing strategy (priority/weighted/auto/etc.) at runtime. Supports selecting the sub-strategy used by auto mode (rules/cost/latency).",
inputSchema: setRoutingStrategyInput,
outputSchema: setRoutingStrategyOutput,
scopes: ["write:combos"],
auditLevel: "full",
phase: 2,
sourceEndpoints: ["/api/combos", "/api/combos/{id}"],
};
// --- Tool 12: omniroute_set_resilience_profile ---
export const setResilienceProfileInput = z.object({
profile: z
.enum(["aggressive", "balanced", "conservative"])
@@ -502,7 +549,7 @@ export const setResilienceProfileTool: McpToolDefinition<
sourceEndpoints: ["/api/resilience"],
};
// --- Tool 12: omniroute_test_combo ---
// --- Tool 13: omniroute_test_combo ---
export const testComboInput = z.object({
comboId: z.string().describe("ID of the combo to test"),
testPrompt: z.string().max(500).describe("Short test prompt (max 500 chars)"),
@@ -540,7 +587,7 @@ export const testComboTool: McpToolDefinition<typeof testComboInput, typeof test
sourceEndpoints: ["/api/combos/test", "/v1/chat/completions"],
};
// --- Tool 13: omniroute_get_provider_metrics ---
// --- Tool 14: omniroute_get_provider_metrics ---
export const getProviderMetricsInput = z.object({
provider: z.string().describe("Provider name (e.g., 'claude', 'gemini-cli', 'codex')"),
});
@@ -583,7 +630,7 @@ export const getProviderMetricsTool: McpToolDefinition<
sourceEndpoints: ["/api/provider-metrics", "/api/resilience"],
};
// --- Tool 14: omniroute_best_combo_for_task ---
// --- Tool 15: omniroute_best_combo_for_task ---
export const bestComboForTaskInput = z.object({
taskType: z
.enum(["coding", "review", "planning", "analysis", "debugging", "documentation"])
@@ -628,7 +675,7 @@ export const bestComboForTaskTool: McpToolDefinition<
sourceEndpoints: ["/api/combos", "/api/combos/metrics", "/api/monitoring/health"],
};
// --- Tool 15: omniroute_explain_route ---
// --- Tool 16: omniroute_explain_route ---
export const explainRouteInput = z.object({
requestId: z.string().describe("Request ID from the X-Request-Id header"),
});
@@ -674,7 +721,7 @@ export const explainRouteTool: McpToolDefinition<
sourceEndpoints: [],
};
// --- Tool 16: omniroute_get_session_snapshot ---
// --- Tool 17: omniroute_get_session_snapshot ---
export const getSessionSnapshotInput = z.object({}).describe("No parameters required");
export const getSessionSnapshotOutput = z.object({
@@ -723,7 +770,7 @@ export const getSessionSnapshotTool: McpToolDefinition<
sourceEndpoints: ["/api/usage/analytics", "/api/telemetry/summary"],
};
// --- Tool 17: omniroute_sync_pricing ---
// --- Tool 18: omniroute_sync_pricing ---
export const syncPricingInput = z.object({
sources: z
.array(z.string())
@@ -775,6 +822,7 @@ export const MCP_TOOLS = [
// Phase 2: Advanced
simulateRouteTool,
setBudgetGuardTool,
setRoutingStrategyTool,
setResilienceProfileTool,
testComboTool,
getProviderMetricsTool,
+14
View File
@@ -25,6 +25,7 @@ import {
listModelsCatalogInput,
simulateRouteInput,
setBudgetGuardInput,
setRoutingStrategyInput,
setResilienceProfileInput,
testComboInput,
getProviderMetricsInput,
@@ -45,6 +46,7 @@ import {
import {
handleSimulateRoute,
handleSetBudgetGuard,
handleSetRoutingStrategy,
handleSetResilienceProfile,
handleTestCombo,
handleGetProviderMetrics,
@@ -593,6 +595,18 @@ export function createMcpServer(): McpServer {
)
);
server.registerTool(
"omniroute_set_routing_strategy",
{
description:
"Updates combo routing strategy at runtime (priority/weighted/round-robin/auto/etc.)",
inputSchema: setRoutingStrategyInput,
},
withScopeEnforcement("omniroute_set_routing_strategy", (args) =>
handleSetRoutingStrategy(setRoutingStrategyInput.parse(args))
)
);
server.registerTool(
"omniroute_set_resilience_profile",
{
+111 -7
View File
@@ -1,16 +1,18 @@
/**
* OmniRoute MCP Advanced Tools 8 intelligence tools that differentiate
* OmniRoute MCP Advanced Tools 10 intelligence tools that differentiate
* OmniRoute from all other AI gateways.
*
* Tools:
* 1. omniroute_simulate_route Dry-run routing simulation
* 2. omniroute_set_budget_guard Session budget with degrade/block/alert
* 3. omniroute_set_resilience_profile Circuit breaker/retry profiles
* 4. omniroute_test_combo Live test each provider in a combo
* 5. omniroute_get_provider_metrics Detailed per-provider metrics
* 6. omniroute_best_combo_for_task AI-powered combo recommendation
* 7. omniroute_explain_route Post-hoc routing decision explainer
* 8. omniroute_get_session_snapshot Full session state snapshot
* 3. omniroute_set_routing_strategy Runtime strategy switch for combos
* 4. omniroute_set_resilience_profile Circuit breaker/retry profiles
* 5. omniroute_test_combo Live test each provider in a combo
* 6. omniroute_get_provider_metrics Detailed per-provider metrics
* 7. omniroute_best_combo_for_task AI-powered combo recommendation
* 8. omniroute_explain_route Post-hoc routing decision explainer
* 9. omniroute_get_session_snapshot Full session state snapshot
* 10. omniroute_sync_pricing Sync provider pricing from external source
*/
import { logToolCall } from "../audit.ts";
@@ -335,6 +337,108 @@ export async function handleSetBudgetGuard(args: {
}
}
export async function handleSetRoutingStrategy(args: {
comboId: string;
strategy:
| "priority"
| "weighted"
| "round-robin"
| "strict-random"
| "random"
| "least-used"
| "cost-optimized"
| "auto";
autoRoutingStrategy?: "rules" | "cost" | "eco" | "latency" | "fast";
}) {
const start = Date.now();
try {
const combos = normalizeCombosResponse(await apiFetch("/api/combos"));
const combo = combos.find(
(comboEntry) =>
toString(comboEntry.id) === args.comboId || toString(comboEntry.name) === args.comboId
);
if (!combo) {
const msg = `Combo '${args.comboId}' not found`;
await logToolCall(
"omniroute_set_routing_strategy",
args,
null,
Date.now() - start,
false,
msg
);
return { content: [{ type: "text" as const, text: `Error: ${msg}` }], isError: true };
}
const comboId = toString(combo.id);
if (!comboId) {
const msg = "Matched combo has no id";
await logToolCall(
"omniroute_set_routing_strategy",
args,
null,
Date.now() - start,
false,
msg
);
return { content: [{ type: "text" as const, text: `Error: ${msg}` }], isError: true };
}
const comboData = toRecord(combo.data);
const currentConfig = toRecord(
Object.keys(toRecord(combo.config)).length > 0 ? combo.config : comboData.config
);
let nextConfig: JsonRecord | undefined = undefined;
if (args.strategy === "auto" && args.autoRoutingStrategy) {
const currentAutoConfig = toRecord(currentConfig.auto);
nextConfig = {
...currentConfig,
auto: {
...currentAutoConfig,
routingStrategy: args.autoRoutingStrategy,
},
};
}
const payload: JsonRecord = { strategy: args.strategy };
if (nextConfig && Object.keys(nextConfig).length > 0) {
payload.config = nextConfig;
}
const updatedCombo = toRecord(
await apiFetch(`/api/combos/${encodeURIComponent(comboId)}`, {
method: "PUT",
body: JSON.stringify(payload),
})
);
const updatedConfig = toRecord(updatedCombo.config);
const resolvedAutoStrategy =
toString(toRecord(updatedConfig.auto).routingStrategy) ||
(args.strategy === "auto" ? (args.autoRoutingStrategy ?? "rules") : "");
const result = {
success: true,
combo: {
id: toString(updatedCombo.id, comboId),
name: toString(updatedCombo.name, toString(combo.name, comboId)),
strategy: toString(updatedCombo.strategy, args.strategy),
autoRoutingStrategy:
toString(updatedCombo.strategy, args.strategy) === "auto" ? resolvedAutoStrategy : null,
},
};
await logToolCall("omniroute_set_routing_strategy", args, result, Date.now() - start, true);
return { content: [{ type: "text" as const, text: JSON.stringify(result, null, 2) }] };
} catch (err) {
const msg = err instanceof Error ? err.message : String(err);
await logToolCall("omniroute_set_routing_strategy", args, null, Date.now() - start, false, msg);
return { content: [{ type: "text" as const, text: `Error: ${msg}` }], isError: true };
}
}
export async function handleSetResilienceProfile(args: {
profile: "aggressive" | "balanced" | "conservative";
}) {
+36 -3
View File
@@ -20,6 +20,7 @@ import {
import { getTaskFitness } from "./taskFitness";
import { getModePack } from "./modePacks";
import { getSelfHealingManager } from "./selfHealing";
import { classifyPromptIntent } from "../intentClassifier";
export interface AutoComboConfig {
id: string;
@@ -30,6 +31,8 @@ export interface AutoComboConfig {
modePack?: string;
budgetCap?: number; // max cost per request in USD
explorationRate: number; // 0.05 = 5% exploratory
/** If set, RouterStrategy name to use for selection ('rules' | 'cost' | 'latency') */
routerStrategy?: string;
}
export interface SelectionResult {
@@ -43,14 +46,44 @@ export interface SelectionResult {
/**
* Select the best provider from an auto-combo pool.
*
* @param config - AutoCombo configuration
* @param candidates - Provider candidates to score
* @param taskType - Task type hint. When "default" or omitted, the engine will attempt
* to infer the intent from `promptMessages` using multilingual classification.
* @param promptMessages - Optional raw messages for intent classification
*/
export function selectProvider(
config: AutoComboConfig,
candidates: ProviderCandidate[],
taskType: string = "default"
taskType: string = "default",
promptMessages?: Array<{ role: string; content: unknown }>
): SelectionResult {
const healer = getSelfHealingManager();
// ── Intent classification (ClawRouter Feature #10/11) ────────────────────
// When taskType is generic ('default'), attempt to classify the prompt intent
// using the multilingual intentClassifier for better task fitness scoring.
let effectiveTaskType = taskType;
if ((taskType === "default" || taskType === "") && promptMessages?.length) {
// Extract text from last user message for classification
const lastUserMsg = [...promptMessages].reverse().find((m) => m.role === "user");
if (lastUserMsg) {
const text =
typeof lastUserMsg.content === "string"
? lastUserMsg.content
: Array.isArray(lastUserMsg.content)
? (lastUserMsg.content as Array<{ type: string; text?: string }>)
.filter((b) => b.type === "text")
.map((b) => b.text || "")
.join(" ")
: "";
if (text.length > 10) {
const intent = classifyPromptIntent(text);
effectiveTaskType = intent; // 'code' | 'reasoning' | 'simple' | 'medium'
}
}
}
// Resolve weights from mode pack or config
let weights = config.weights;
if (config.modePack) {
@@ -80,8 +113,8 @@ export function selectProvider(
excluded.length = 0;
}
// Score all providers
const scored = scorePool(pool, taskType, weights, getTaskFitness);
// Score all providers (using classified intent if available)
const scored = scorePool(pool, effectiveTaskType, weights, getTaskFitness);
// Apply self-healing re-evaluation with actual scores
const finalCandidates = scored.filter((s) => {
@@ -0,0 +1,159 @@
/**
* RouterStrategy Pluggable Routing Strategy System
*
* Inspired by ClawRouter commit 14c83c258 "refactor: extract routing into pluggable RouterStrategy system".
* Provides a RouterStrategy interface and two built-in implementations:
* - RulesStrategy (default): wraps the existing 6-factor scoring engine
* - CostStrategy: always picks cheapest available model
*/
import type { ProviderCandidate, ScoredProvider } from "./scoring.ts";
import { scorePool } from "./scoring.ts";
import { getTaskFitness } from "./taskFitness.ts";
export interface RoutingContext {
taskType: string;
requestHasTools?: boolean;
requestHasVision?: boolean;
estimatedInputTokens?: number;
}
export interface RoutingDecision {
provider: string;
model: string;
strategy: string;
reason: string;
candidatesConsidered: number;
finalScore: number;
}
export interface RouterStrategy {
readonly name: string;
readonly description: string;
select(pool: ProviderCandidate[], context: RoutingContext): RoutingDecision;
}
// ── RulesStrategy: wraps 6-factor scoring engine ────────────────────────────
class RulesStrategyImpl implements RouterStrategy {
readonly name = "rules";
readonly description =
"6-factor weighted scoring: quota, health, cost, latency, taskFit, stability";
select(pool: ProviderCandidate[], context: RoutingContext): RoutingDecision {
const eligible = pool.filter((c) => c.circuitBreakerState !== "OPEN");
const ranked: ScoredProvider[] = scorePool(
eligible.length > 0 ? eligible : pool,
context.taskType,
undefined,
getTaskFitness
);
const best = ranked[0];
if (!best) throw new Error("[RulesStrategy] No candidates to score");
return {
provider: best.provider,
model: best.model,
strategy: this.name,
reason: `RulesStrategy: score=${best.score.toFixed(3)} (quota=${best.factors.quota.toFixed(2)}, health=${best.factors.health.toFixed(2)}, cost=${best.factors.costInv.toFixed(2)}, taskFit=${best.factors.taskFit.toFixed(2)})`,
candidatesConsidered: ranked.length,
finalScore: best.score,
};
}
}
// ── CostStrategy: always picks cheapest healthy provider ─────────────────────
class CostStrategyImpl implements RouterStrategy {
readonly name = "cost";
readonly description = "Always selects cheapest available provider (by costPer1MTokens)";
select(pool: ProviderCandidate[], context: RoutingContext): RoutingDecision {
const healthy = pool.filter((c) => c.circuitBreakerState !== "OPEN");
const candidates = healthy.length > 0 ? healthy : pool;
const sorted = [...candidates].sort((a, b) => a.costPer1MTokens - b.costPer1MTokens);
const best = sorted[0];
if (!best) throw new Error("[CostStrategy] No candidates available");
return {
provider: best.provider,
model: best.model,
strategy: this.name,
reason: `CostStrategy: cheapest at $${best.costPer1MTokens.toFixed(3)}/1M tokens`,
candidatesConsidered: candidates.length,
finalScore: best.costPer1MTokens === 0 ? 1.0 : 1 / best.costPer1MTokens,
};
}
}
// ── LatencyStrategy: prioritize low latency + reliability ───────────────────
class LatencyStrategyImpl implements RouterStrategy {
readonly name = "latency";
readonly description = "Prioritizes lowest p95 latency with reliability weighting";
select(pool: ProviderCandidate[], context: RoutingContext): RoutingDecision {
const healthy = pool.filter((c) => c.circuitBreakerState !== "OPEN");
const candidates = healthy.length > 0 ? healthy : pool;
const sorted = [...candidates].sort((a, b) => {
const aPenalty = a.errorRate * 1000;
const bPenalty = b.errorRate * 1000;
return a.p95LatencyMs + aPenalty - (b.p95LatencyMs + bPenalty);
});
const best = sorted[0];
if (!best) throw new Error("[LatencyStrategy] No candidates available");
const latencyScore = best.p95LatencyMs > 0 ? Math.max(0.001, 10_000 / best.p95LatencyMs) : 1;
const reliability = Math.max(0, 1 - best.errorRate);
const finalScore = latencyScore * 0.7 + reliability * 0.3;
return {
provider: best.provider,
model: best.model,
strategy: this.name,
reason: `LatencyStrategy: p95=${best.p95LatencyMs}ms, errorRate=${(best.errorRate * 100).toFixed(2)}%`,
candidatesConsidered: candidates.length,
finalScore,
};
}
}
// ── Registry ──────────────────────────────────────────────────────────────────
const strategyRegistry = new Map<string, RouterStrategy>();
const rulesStrategy = new RulesStrategyImpl();
const costStrategy = new CostStrategyImpl();
const latencyStrategy = new LatencyStrategyImpl();
strategyRegistry.set("rules", rulesStrategy);
strategyRegistry.set("cost", costStrategy);
strategyRegistry.set("eco", costStrategy); // alias
strategyRegistry.set("latency", latencyStrategy);
strategyRegistry.set("fast", latencyStrategy); // alias
export function getStrategy(name: string): RouterStrategy {
const strategy = strategyRegistry.get(name);
if (!strategy) {
console.warn(`[RouterStrategy] Strategy '${name}' not found, falling back to 'rules'`);
return rulesStrategy;
}
return strategy;
}
export function registerStrategy(name: string, strategy: RouterStrategy): void {
if (strategyRegistry.has(name)) {
console.warn(`[RouterStrategy] Overwriting strategy '${name}'`);
}
strategyRegistry.set(name, strategy);
}
export function listStrategies(): Array<{ name: string; description: string }> {
return [...strategyRegistry.entries()].map(([name, s]) => ({ name, description: s.description }));
}
export function selectWithStrategy(
pool: ProviderCandidate[],
context: RoutingContext,
strategyName = "rules"
): RoutingDecision {
return getStrategy(strategyName).select(pool, context);
}
+2 -1
View File
@@ -74,7 +74,8 @@ export function calculateScore(factors: ScoringFactors, weights: ScoringWeights)
weights.costInv * factors.costInv +
weights.latencyInv * factors.latencyInv +
weights.taskFit * factors.taskFit +
weights.stability * factors.stability
weights.stability * factors.stability +
weights.tierPriority * factors.tierPriority
);
}
@@ -24,10 +24,23 @@ const FITNESS_TABLE: Record<string, Record<string, number>> = {
"deepseek-coder": 0.9,
"deepseek-v3": 0.85,
"deepseek-r1": 0.88,
"deepseek-chat": 0.84, // DeepSeek V3.2 Chat — strong code performance
"deepseek-v3.2": 0.86, // Explicit V3.2 alias
qwen: 0.78,
llama: 0.72,
mistral: 0.75,
mixtral: 0.77,
// Grok-4 fast — good code, ultra-low latency (1143ms P50)
"grok-4-fast": 0.8,
"grok-4": 0.82,
"grok-3": 0.8,
// Kimi K2.5 — agentic with tool calling, good at code tasks
"kimi-k2": 0.82,
// GLM-5 — Z.AI model with 128k output
"glm-5": 0.78,
// MiniMax M2.5 — reasoning support helps complex code
"minimax-m2.5": 0.75,
"minimax-m2": 0.72,
},
review: {
"claude-sonnet": 0.92,
@@ -58,10 +71,15 @@ const FITNESS_TABLE: Record<string, Record<string, number>> = {
"claude-sonnet": 0.92,
"gemini-2.5-pro": 0.95,
"gemini-pro": 0.88,
"gemini-3.1-pro": 0.95, // Gemini 3.1 Pro — 1M context, ideal for long analysis
"gpt-4o": 0.85,
o1: 0.9,
o3: 0.93,
"deepseek-r1": 0.88,
"deepseek-chat": 0.8,
"kimi-k2": 0.82, // Kimi K2.5 agentic — good for analysis
"glm-5": 0.78, // GLM-5 with 128k output for long analysis
"minimax-m2.5": 0.76,
},
debugging: {
"claude-sonnet": 0.93,
@@ -87,8 +105,17 @@ const FITNESS_TABLE: Record<string, Record<string, number>> = {
"claude-opus": 0.85,
"gpt-4o": 0.85,
"gemini-pro": 0.8,
"gemini-3.1-pro": 0.85,
"deepseek-v3": 0.75,
"deepseek-chat": 0.74,
"gemini-flash": 0.72,
// New models from ClawRouter analysis (2026-03-17):
"grok-4-fast": 0.72, // ultra-fast, suitable for all tasks
"grok-4": 0.74,
"grok-3": 0.73,
"kimi-k2": 0.76, // agentic multi-step tasks
"glm-5": 0.7,
"minimax-m2.5": 0.7,
},
};
+371 -4
View File
@@ -5,18 +5,37 @@
import { checkFallbackError, formatRetryAfter, getProviderProfile } from "./accountFallback.ts";
import { unavailableResponse } from "../utils/error.ts";
import { recordComboRequest, getComboMetrics } from "./comboMetrics.ts";
import { recordComboIntent, recordComboRequest, getComboMetrics } from "./comboMetrics.ts";
import { resolveComboConfig, getDefaultComboConfig } from "./comboConfig.ts";
import * as semaphore from "./rateLimitSemaphore.ts";
import { getCircuitBreaker } from "../../src/shared/utils/circuitBreaker";
import { fisherYatesShuffle, getNextFromDeck } from "../../src/shared/utils/shuffleDeck";
import { parseModel } from "./model.ts";
import { applyComboAgentMiddleware, injectModelTag } from "./comboAgentMiddleware.ts";
import { classifyWithConfig, DEFAULT_INTENT_CONFIG } from "./intentClassifier.ts";
import { selectProvider as selectAutoProvider } from "./autoCombo/engine.ts";
import { selectWithStrategy } from "./autoCombo/routerStrategy.ts";
import { DEFAULT_WEIGHTS, scorePool } from "./autoCombo/scoring.ts";
import { supportsToolCalling } from "./modelCapabilities.ts";
// Status codes that should mark semaphore + record circuit breaker failures
const TRANSIENT_FOR_BREAKER = [429, 502, 503, 504];
const MAX_COMBO_DEPTH = 3;
// Bootstrap defaults from ClawRouter benchmark (used when no local latency history exists yet)
const DEFAULT_MODEL_P95_MS = {
"grok-4-fast-non-reasoning": 1143,
"grok-4-1-fast-non-reasoning": 1244,
"gemini-2.5-flash": 1238,
"kimi-k2.5": 1646,
"gpt-4o-mini": 2764,
"claude-sonnet-4.6": 4000,
"claude-opus-4.6": 6000,
"deepseek-chat": 2000,
};
const MIN_HISTORY_SAMPLES = 10;
// In-memory atomic counter per combo for round-robin distribution
// Resets on server restart (by design — no stale state)
const rrCounters = new Map();
@@ -201,6 +220,193 @@ function sortModelsByUsage(models, comboName) {
return withUsage.map((e) => e.modelStr);
}
function toTextContent(content) {
if (typeof content === "string") return content;
if (!Array.isArray(content)) return "";
return content
.map((part) => {
if (!part || typeof part !== "object") return "";
if (typeof part.text === "string") return part.text;
return "";
})
.join("\n");
}
function extractPromptForIntent(body) {
if (!body || typeof body !== "object") return "";
const fromMessages = Array.isArray(body.messages)
? [...body.messages].reverse().find((m) => m && typeof m === "object" && m.role === "user")
: null;
if (fromMessages) return toTextContent(fromMessages.content);
if (typeof body.input === "string") return body.input;
if (Array.isArray(body.input)) {
const text = body.input
.map((item) => {
if (!item || typeof item !== "object") return "";
if (typeof item.content === "string") return item.content;
if (typeof item.text === "string") return item.text;
return "";
})
.filter(Boolean)
.join("\n");
if (text) return text;
}
if (typeof body.prompt === "string") return body.prompt;
return "";
}
function mapIntentToTaskType(intent) {
switch (intent) {
case "code":
return "coding";
case "reasoning":
return "analysis";
case "simple":
return "default";
case "medium":
default:
return "default";
}
}
function toStringArray(input) {
if (Array.isArray(input)) {
return input.map((v) => (typeof v === "string" ? v.trim() : "")).filter(Boolean);
}
if (typeof input === "string") {
return input
.split(",")
.map((v) => v.trim())
.filter(Boolean);
}
return [];
}
function getIntentConfig(settings, combo) {
const comboIntentConfig =
combo?.autoConfig?.intentConfig ||
combo?.config?.auto?.intentConfig ||
combo?.config?.intentConfig ||
{};
return {
...DEFAULT_INTENT_CONFIG,
...comboIntentConfig,
...(typeof settings?.intentDetectionEnabled === "boolean"
? { enabled: settings.intentDetectionEnabled }
: {}),
...(Number.isFinite(Number(settings?.intentSimpleMaxWords))
? { simpleMaxWords: Number(settings.intentSimpleMaxWords) }
: {}),
...(toStringArray(settings?.intentExtraCodeKeywords).length > 0
? { extraCodeKeywords: toStringArray(settings.intentExtraCodeKeywords) }
: {}),
...(toStringArray(settings?.intentExtraReasoningKeywords).length > 0
? { extraReasoningKeywords: toStringArray(settings.intentExtraReasoningKeywords) }
: {}),
...(toStringArray(settings?.intentExtraSimpleKeywords).length > 0
? { extraSimpleKeywords: toStringArray(settings.intentExtraSimpleKeywords) }
: {}),
};
}
function getBootstrapLatencyMs(modelId) {
const normalized = String(modelId || "").toLowerCase();
return DEFAULT_MODEL_P95_MS[normalized] ?? 1500;
}
async function buildAutoCandidates(modelStrings, comboName) {
const metrics = getComboMetrics(comboName);
const { getPricingForModel } = await import("../../src/lib/localDb");
let historicalLatencyStats = {};
try {
const { getModelLatencyStats } = await import("../../src/lib/usageDb");
historicalLatencyStats = await getModelLatencyStats({
windowHours: 24,
minSamples: 3,
maxRows: 10000,
});
} catch {
// keep empty stats — auto-combo will use runtime + bootstrap signals
}
const candidates = await Promise.all(
modelStrings.map(async (modelStr) => {
const parsed = parseModel(modelStr);
const provider = parsed.provider || parsed.providerAlias || "unknown";
const model = parsed.model || modelStr;
const historicalKey = `${provider}/${model}`;
const historicalModelMetric = historicalLatencyStats[historicalKey] || null;
const historicalTotal = Number(historicalModelMetric?.totalRequests);
const hasHistoricalSignal =
Number.isFinite(historicalTotal) && historicalTotal >= MIN_HISTORY_SAMPLES;
let costPer1MTokens = 1;
try {
const pricing = await getPricingForModel(provider, model);
const inputPrice = Number(pricing?.input);
if (Number.isFinite(inputPrice) && inputPrice >= 0) {
costPer1MTokens = inputPrice;
}
} catch {
// keep default cost
}
const modelMetric = metrics?.byModel?.[modelStr] || null;
const avgLatency = Number(modelMetric?.avgLatencyMs);
const successRate = Number(modelMetric?.successRate);
const historicalP95Latency = Number(historicalModelMetric?.p95LatencyMs);
const historicalStdDev = Number(historicalModelMetric?.latencyStdDev);
const historicalSuccessRate = Number(historicalModelMetric?.successRate); // 0..1
const p95LatencyMs = hasHistoricalSignal
? Number.isFinite(historicalP95Latency) && historicalP95Latency > 0
? historicalP95Latency
: getBootstrapLatencyMs(model)
: Number.isFinite(avgLatency) && avgLatency > 0
? avgLatency
: getBootstrapLatencyMs(model);
const errorRate = hasHistoricalSignal
? Number.isFinite(historicalSuccessRate) &&
historicalSuccessRate >= 0 &&
historicalSuccessRate <= 1
? 1 - historicalSuccessRate
: 0.05
: Number.isFinite(successRate) && successRate >= 0 && successRate <= 100
? 1 - successRate / 100
: 0.05;
const latencyStdDev =
hasHistoricalSignal && Number.isFinite(historicalStdDev) && historicalStdDev > 0
? Math.max(10, historicalStdDev)
: Math.max(10, p95LatencyMs * 0.1);
const breakerStateRaw = getCircuitBreaker(`combo:${modelStr}`)?.getStatus?.()?.state;
const circuitBreakerState =
breakerStateRaw === "OPEN" || breakerStateRaw === "HALF_OPEN" ? breakerStateRaw : "CLOSED";
return {
provider,
model,
quotaRemaining: 100,
quotaTotal: 100,
circuitBreakerState,
costPer1MTokens,
p95LatencyMs,
latencyStdDev,
errorRate,
accountTier: "standard",
quotaResetIntervalSecs: 86400,
};
})
);
return candidates;
}
/**
* Handle combo chat with fallback
* Supports all 6 strategies: priority, weighted, round-robin, random, least-used, cost-optimized
@@ -225,12 +431,49 @@ export async function handleComboChat({
const strategy = combo.strategy || "priority";
const models = combo.models || [];
// ── Combo Agent Middleware (#399 + #401) ────────────────────────────────
// Apply system_message override, tool_filter_regex, and extract pinned model
// from context caching tag. These are all opt-in per combo config.
const { body: agentBody, pinnedModel } = applyComboAgentMiddleware(
body,
combo,
"" // provider/model not yet known — resolved per-model in loop
);
body = agentBody;
if (pinnedModel) {
log.info("COMBO", `[#401] Context caching: pinned model=${pinnedModel}`);
}
// Wrap handleSingleModel to inject context caching tag on response (#401)
const handleSingleModelWrapped = combo.context_cache_protection
? async (b, modelStr) => {
const res = await handleSingleModel(b, modelStr);
// Inject tag only on success and only for non-streaming non-binary responses
if (res.ok && !b.stream) {
try {
const json = await res.clone().json();
const msgs = Array.isArray(json?.messages) ? json.messages : [];
if (msgs.length > 0) {
const tagged = injectModelTag(msgs, modelStr);
return new Response(JSON.stringify({ ...json, messages: tagged }), {
status: res.status,
headers: res.headers,
});
}
} catch {
/* non-JSON or stream — skip tagging */
}
}
return res;
}
: handleSingleModel;
// ─────────────────────────────────────────────────────────────────────────
// Route to round-robin handler if strategy matches
if (strategy === "round-robin") {
return handleRoundRobinCombo({
body,
combo,
handleSingleModel,
handleSingleModel: handleSingleModelWrapped,
isModelAvailable,
log,
settings,
@@ -278,7 +521,131 @@ export async function handleComboChat({
}
// Apply strategy-specific ordering
if (strategy === "strict-random") {
if (strategy === "auto") {
const requestHasTools = Array.isArray(body?.tools) && body.tools.length > 0;
let eligibleModels = [...orderedModels];
if (requestHasTools) {
const filtered = eligibleModels.filter((m) => supportsToolCalling(m));
if (filtered.length > 0) {
eligibleModels = filtered;
} else {
log.warn(
"COMBO",
"Auto strategy: all candidates filtered by tool-calling policy, falling back to full pool"
);
}
}
const prompt = extractPromptForIntent(body);
const systemPrompt =
typeof combo?.system_message === "string" ? combo.system_message : undefined;
const intentConfig = getIntentConfig(settings, combo);
const intent = classifyWithConfig(prompt, intentConfig, systemPrompt);
recordComboIntent(combo.name, intent);
const taskType = mapIntentToTaskType(intent);
const autoConfigSource = combo?.autoConfig || combo?.config?.auto || combo?.config || {};
const routingStrategy =
typeof autoConfigSource.routingStrategy === "string"
? autoConfigSource.routingStrategy
: typeof autoConfigSource.strategyName === "string"
? autoConfigSource.strategyName
: "rules";
const candidatePool = Array.isArray(autoConfigSource.candidatePool)
? autoConfigSource.candidatePool
: [
...new Set(
eligibleModels.map((m) => {
const parsed = parseModel(m);
return parsed.provider || parsed.providerAlias || "unknown";
})
),
];
const weights =
autoConfigSource.weights && typeof autoConfigSource.weights === "object"
? autoConfigSource.weights
: DEFAULT_WEIGHTS;
const explorationRate = Number.isFinite(Number(autoConfigSource.explorationRate))
? Number(autoConfigSource.explorationRate)
: 0.05;
const budgetCap = Number.isFinite(Number(autoConfigSource.budgetCap))
? Number(autoConfigSource.budgetCap)
: undefined;
const modePack =
typeof autoConfigSource.modePack === "string" ? autoConfigSource.modePack : undefined;
const candidates = await buildAutoCandidates(eligibleModels, combo.name);
if (candidates.length > 0) {
let selectedProvider = null;
let selectedModel = null;
let selectionReason = "";
if (routingStrategy !== "rules") {
try {
const decision = selectWithStrategy(
candidates,
{ taskType, requestHasTools },
routingStrategy
);
selectedProvider = decision.provider;
selectedModel = decision.model;
selectionReason = decision.reason;
} catch (err) {
log.warn(
"COMBO",
`Auto strategy '${routingStrategy}' failed (${err?.message || "unknown"}), falling back to rules`
);
}
}
if (!selectedProvider || !selectedModel) {
const selection = selectAutoProvider(
{
id: combo.id || combo.name,
name: combo.name,
type: "auto",
candidatePool,
weights,
modePack,
budgetCap,
explorationRate,
},
candidates,
taskType
);
selectedProvider = selection.provider;
selectedModel = selection.model;
selectionReason = `score=${selection.score.toFixed(3)}${selection.isExploration ? " (exploration)" : ""}`;
}
const modelLookup = new Map();
for (const modelStr of eligibleModels) {
const parsed = parseModel(modelStr);
const provider = parsed.provider || parsed.providerAlias || "unknown";
const modelId = parsed.model || modelStr;
modelLookup.set(`${provider}/${modelId}`, modelStr);
}
const ranked = scorePool(candidates, taskType, weights)
.map((r) => modelLookup.get(`${r.provider}/${r.model}`) || `${r.provider}/${r.model}`)
.filter(Boolean);
const selectedModelStr =
modelLookup.get(`${selectedProvider}/${selectedModel}`) ||
`${selectedProvider}/${selectedModel}`;
orderedModels = [...new Set([selectedModelStr, ...ranked, ...eligibleModels])];
log.info(
"COMBO",
`Auto selection: ${selectedModelStr} | intent=${intent} task=${taskType} | strategy=${routingStrategy} | ${selectionReason}`
);
} else {
log.warn("COMBO", "Auto strategy has no candidates, keeping default ordering");
}
} else if (strategy === "strict-random") {
const selectedId = await getNextFromDeck(`combo:${combo.name}`, orderedModels);
// Put selected model first so the fallback loop tries it first
const rest = orderedModels.filter((m) => m !== selectedId);
@@ -348,7 +715,7 @@ export async function handleComboChat({
`Trying model ${i + 1}/${orderedModels.length}: ${modelStr}${retry > 0 ? ` (retry ${retry})` : ""}`
);
const result = await handleSingleModel(body, modelStr);
const result = await handleSingleModelWrapped(body, modelStr);
// Success — return response
if (result.ok) {
+188
View File
@@ -0,0 +1,188 @@
/**
* comboAgentMiddleware.ts Combo Agent Features
*
* Implements the "combo as agent" features from issues #399 and #401:
*
* 1. **System Message Override** (#399): If the combo defines a `system_message`,
* it is injected as the first system message, replacing any existing system message.
*
* 2. **Tool Filter Regex** (#399): If the combo defines a `tool_filter_regex`,
* only tools whose name matches the pattern are forwarded to the provider.
*
* 3. **Context Caching Protection** (#401): If the combo enables
* `context_cache_protection`, the proxy:
* a. On response: injects `<omniModel>provider/model</omniModel>` tag into
* the first assistant message content string.
* b. On request: scans the message history for the tag, and if found,
* overrides the requested model with the pinned one.
*
* All features are opt-in per combo and backward compatible with existing setups.
*/
interface ComboConfig {
system_message?: string | null;
tool_filter_regex?: string | null;
context_cache_protection?: number | boolean;
[key: string]: unknown;
}
interface Message {
role?: string;
content?: unknown;
[key: string]: unknown;
}
// ── Context Caching Tag ─────────────────────────────────────────────────────
const CACHE_TAG_PATTERN = /<omniModel>([^<]+)<\/omniModel>/;
/**
* Inject the model tag into the last assistant message (or append a new one).
* Only modifies string content does not touch array content to avoid breaking
* Claude/Gemini multi-part message formats.
*/
export function injectModelTag(messages: Message[], providerModel: string): Message[] {
// Remove any existing tag first to avoid duplication on context compaction
const cleaned = messages.map((msg) => {
if (msg.role === "assistant" && typeof msg.content === "string") {
return { ...msg, content: msg.content.replace(CACHE_TAG_PATTERN, "").trimEnd() };
}
return msg;
});
// Find last assistant message with string content
const lastAssistantIdx = cleaned.map((m) => m.role).lastIndexOf("assistant");
if (lastAssistantIdx === -1) return cleaned;
const msg = cleaned[lastAssistantIdx];
if (typeof msg.content !== "string") return cleaned;
const tagged = [...cleaned];
tagged[lastAssistantIdx] = {
...msg,
content: `${msg.content}\n<omniModel>${providerModel}</omniModel>`,
};
return tagged;
}
/**
* Scan message history for the model tag injected by a previous response.
* Returns the pinned "provider/model" string, or null if not found.
*/
export function extractPinnedModel(messages: Message[]): string | null {
// Scan from newest to oldest for efficiency
for (let i = messages.length - 1; i >= 0; i--) {
const msg = messages[i];
if (msg.role === "assistant" && typeof msg.content === "string") {
const match = CACHE_TAG_PATTERN.exec(msg.content);
if (match) return match[1];
}
}
return null;
}
// ── System Message Override ──────────────────────────────────────────────────
/**
* Replace or inject a system message at the beginning of the messages array.
* Existing system messages are removed if a combo override is set.
*/
export function applySystemMessageOverride(messages: Message[], systemMessage: string): Message[] {
// Remove all existing system messages
const filtered = messages.filter((m) => m.role !== "system");
// Inject combo system message at start
return [{ role: "system", content: systemMessage }, ...filtered];
}
// ── Tool Filter Regex ────────────────────────────────────────────────────────
/**
* Filter the tools array, keeping only tools whose name matches the regex.
* Returns the original array unchanged if pattern is null/empty.
*/
export function applyToolFilter(
tools: unknown[] | undefined,
pattern: string | null | undefined
): unknown[] | undefined {
if (!tools || !pattern) return tools;
let regex: RegExp;
try {
regex = new RegExp(pattern);
} catch {
// Invalid regex — return tools unchanged rather than crashing
console.warn(`[ComboAgent] Invalid tool_filter_regex: "${pattern}"`);
return tools;
}
return tools.filter((tool) => {
const t = tool as Record<string, unknown>;
// Support both OpenAI format ({ function: { name } }) and Anthropic ({ name })
const name = (t.function as Record<string, unknown> | undefined)?.name ?? t.name ?? "";
return regex.test(String(name));
});
}
/**
* Strip all <omniModel> tags from message content before forwarding to the provider.
* The tag is an internal OmniRoute marker; providers must never see it or their
* cache will treat every tagged request as a new session (#454).
*/
export function stripModelTags(messages: Message[]): Message[] {
return messages.map((msg) => {
if (typeof msg.content === "string" && CACHE_TAG_PATTERN.test(msg.content)) {
return { ...msg, content: msg.content.replace(CACHE_TAG_PATTERN, "").trimEnd() };
}
return msg;
});
}
// ── Main Middleware ──────────────────────────────────────────────────────────
/**
* Apply all combo agent features to the request body.
* Safe to call with null/undefined comboConfig returns body unchanged.
*/
export function applyComboAgentMiddleware(
body: Record<string, unknown>,
comboConfig: ComboConfig | null | undefined,
providerModel: string // "provider/model" string for context caching
): { body: Record<string, unknown>; pinnedModel: string | null } {
if (!comboConfig) return { body, pinnedModel: null };
let messages: Message[] = Array.isArray(body.messages) ? [...body.messages] : [];
let pinnedModel: string | null = null;
// 1. Context caching: check for pinned model in history
if (comboConfig.context_cache_protection) {
pinnedModel = extractPinnedModel(messages);
if (pinnedModel) {
// Model is pinned — caller should override model selection
}
}
// 2. System message override
if (comboConfig.system_message && comboConfig.system_message.trim()) {
messages = applySystemMessageOverride(messages, comboConfig.system_message);
}
// 3. Tool filter
const filteredTools = applyToolFilter(
body.tools as unknown[] | undefined,
comboConfig.tool_filter_regex
);
// 4. Strip internal <omniModel> tags before forwarding to provider (#454)
// These tags are OmniRoute-internal markers and must never reach the provider
// since providers would treat each tagged request as a new cache session.
messages = stripModelTags(messages);
return {
body: {
...body,
messages,
...(filteredTools !== body.tools && { tools: filteredTools }),
},
pinnedModel,
};
}
+27
View File
@@ -21,6 +21,7 @@ interface ComboMetricsEntry {
totalLatencyMs: number;
strategy: string;
lastUsedAt: string | null;
intentCounts: Record<string, number>;
byModel: Record<string, ModelMetrics>;
}
@@ -69,6 +70,7 @@ export function recordComboRequest(
totalLatencyMs: 0,
strategy,
lastUsedAt: null,
intentCounts: {},
byModel: {},
});
}
@@ -131,6 +133,7 @@ export function getComboMetrics(comboName: string): ComboMetricsView | null {
combo.totalRequests > 0 ? Math.round((combo.totalSuccesses / combo.totalRequests) * 100) : 0,
fallbackRate:
combo.totalRequests > 0 ? Math.round((combo.totalFallbacks / combo.totalRequests) * 100) : 0,
intentCounts: { ...combo.intentCounts },
byModel: Object.fromEntries(
Object.entries(combo.byModel).map(([model, m]) => [
model,
@@ -156,6 +159,30 @@ export function getAllComboMetrics(): Record<string, ComboMetricsView | null> {
return result;
}
/**
* Record detected prompt intent for a combo (used by multilingual routing analytics).
*/
export function recordComboIntent(comboName: string, intent: string): void {
if (!metrics.has(comboName)) {
metrics.set(comboName, {
totalRequests: 0,
totalSuccesses: 0,
totalFailures: 0,
totalFallbacks: 0,
totalLatencyMs: 0,
strategy: "priority",
lastUsedAt: null,
intentCounts: {},
byModel: {},
});
}
const combo = metrics.get(comboName);
if (!combo) return;
const key = String(intent || "unknown");
combo.intentCounts[key] = (combo.intentCounts[key] || 0) + 1;
}
/**
* Reset metrics for a specific combo
*/
+103
View File
@@ -0,0 +1,103 @@
/**
* Emergency Fallback Budget Exhaustion Redirect
*
* When a request fails due to budget exhaustion (HTTP 402 or budget keywords
* in the error body), optionally redirect to a free-tier model
* (default provider/model: nvidia + openai/gpt-oss-120b at $0.00/M tokens).
*
* Inspired by ClawRouter: "gpt-oss-120b costs nothing and serves as
* automatic fallback when wallet is empty."
*/
export interface EmergencyFallbackConfig {
enabled: boolean;
provider: string;
model: string;
triggerOn402: boolean;
triggerOnBudgetKeywords: boolean;
budgetKeywords: string[];
/** Skip fallback for tool requests (gpt-oss-120b may not support structured tool calling) */
skipForToolRequests: boolean;
maxOutputTokens: number;
}
export const EMERGENCY_FALLBACK_CONFIG: EmergencyFallbackConfig = {
enabled: true,
provider: "nvidia",
model: "openai/gpt-oss-120b",
triggerOn402: true,
triggerOnBudgetKeywords: true,
budgetKeywords: [
"insufficient funds",
"insufficient_funds",
"budget exceeded",
"budget_exceeded",
"quota exceeded",
"quota_exceeded",
"billing",
"payment required",
"out of credits",
"no credits",
"credit limit",
"spending limit",
"saldo insuficiente",
"limite de gastos",
"cota excedida",
],
skipForToolRequests: true,
maxOutputTokens: 4096,
};
export interface FallbackDecision {
shouldFallback: true;
reason: string;
provider: string;
model: string;
maxOutputTokens: number;
}
export interface NoFallbackDecision {
shouldFallback: false;
reason: string;
}
export type FallbackResult = FallbackDecision | NoFallbackDecision;
export function shouldUseFallback(
status: number,
errorBody: string,
requestHasTools: boolean,
config: EmergencyFallbackConfig = EMERGENCY_FALLBACK_CONFIG
): FallbackResult {
if (!config.enabled) return { shouldFallback: false, reason: "emergency fallback disabled" };
if (config.skipForToolRequests && requestHasTools) {
return { shouldFallback: false, reason: "skipped: request has tools" };
}
if (config.triggerOn402 && status === 402) {
return {
shouldFallback: true,
reason: `HTTP 402 → emergency fallback to ${config.provider}/${config.model}`,
provider: config.provider,
model: config.model,
maxOutputTokens: config.maxOutputTokens,
};
}
if (config.triggerOnBudgetKeywords && errorBody) {
const lowerBody = errorBody.toLowerCase();
const matched = config.budgetKeywords.find((kw) => lowerBody.includes(kw.toLowerCase()));
if (matched) {
return {
shouldFallback: true,
reason: `Budget error detected ('${matched}') → emergency fallback to ${config.provider}/${config.model}`,
provider: config.provider,
model: config.model,
maxOutputTokens: config.maxOutputTokens,
};
}
}
return { shouldFallback: false, reason: "no budget error detected" };
}
export function isFallbackDecision(result: FallbackResult): result is FallbackDecision {
return result.shouldFallback === true;
}
+375
View File
@@ -0,0 +1,375 @@
/**
* Multilingual Intent Detection for AutoCombo
*
* Classifies prompts as: code | reasoning | simple | medium
* using keywords in 9 languages (EN, PT-BR, ES, ZH, JA, RU, DE, KO, AR).
*
* Inspired by ClawRouter (BlockRunAI) multilingual routing system.
* Execution: purely synchronous, <1ms, no I/O.
*/
export type IntentType = "code" | "reasoning" | "simple" | "medium";
export const CODE_KEYWORDS: readonly string[] = [
// English
"function",
"class",
"import",
"def",
"SELECT",
"async",
"await",
"const",
"let",
"var",
"return",
"```",
"algorithm",
"compile",
"debug",
"refactor",
"typescript",
"python",
"javascript",
"code",
"implement",
"write a",
"create a component",
"endpoint",
"repository",
"deploy",
"install",
"script",
"api",
"database",
"query",
"schema",
"interface",
"generic",
"enum",
"module",
"package",
"dependency",
// Português (PT-BR)
"função",
"classe",
"importar",
"definir",
"consulta",
"assíncrono",
"aguardar",
"constante",
"variável",
"retornar",
"algoritmo",
"compilar",
"depurar",
"refatorar",
"código",
"implementar",
"criar um",
"componente",
"como fazer",
"repositório",
"configurar",
"instalar",
"banco de dados",
"escrever uma função",
"criar uma classe",
// Español
"función",
"clase",
"importar",
"definir",
"consulta",
"asíncrono",
"esperar",
"constante",
"variable",
"retornar",
"algoritmo",
"compilar",
"depurar",
"refactorizar",
"código",
"implementar",
// 中文
"函数",
"类",
"导入",
"定义",
"查询",
"异步",
"等待",
"常量",
"变量",
"返回",
"算法",
"编译",
"调试",
"代码",
// 日本語
"関数",
"クラス",
"インポート",
"非同期",
"定数",
"変数",
"コード",
"アルゴリズム",
// Русский
"функция",
"класс",
"импорт",
"запрос",
"асинхронный",
"константа",
"переменная",
"алгоритм",
"код",
// Deutsch
"funktion",
"klasse",
"importieren",
"abfrage",
"asynchron",
"konstante",
"variable",
"algorithmus",
"code",
// 한국어
"함수",
"클래스",
"가져오기",
"정의",
"쿼리",
"비동기",
"대기",
"상수",
"변수",
"반환",
"코드",
// العربية
"دالة",
"فئة",
"استيراد",
"استعلام",
"غير متزامن",
"ثابت",
"متغير",
"كود",
"خوارزمية",
];
export const REASONING_KEYWORDS: readonly string[] = [
// English
"prove",
"theorem",
"derive",
"step by step",
"chain of thought",
"formally",
"mathematical",
"proof",
"logically",
"analyze",
"reasoning",
"deduce",
"infer",
"hypothesis",
"convergence",
// Português (PT-BR)
"provar",
"teorema",
"derivar",
"passo a passo",
"cadeia de pensamento",
"formalmente",
"matemático",
"prova",
"logicamente",
"analisar",
"raciocínio",
"deduzir",
"inferir",
"hipótese",
"demonstrar",
"cálculo",
"equação diferencial",
"integral",
"otimização",
// Español
"demostrar",
"teorema",
"derivar",
"paso a paso",
"formalmente",
"matemático",
"lógicamente",
// 中文
"证明",
"定理",
"推导",
"逐步",
"思维链",
"数学",
"逻辑",
"分析",
// 日本語
"証明",
"定理",
"導出",
"論理的",
"分析",
// Русский
"доказать",
"теорема",
"шаг за шагом",
"математически",
"логически",
// Deutsch
"beweisen",
"theorem",
"schritt für schritt",
"mathematisch",
"logisch",
// 한국어
"증명",
"정리",
"단계별",
"수학적",
"논리적",
// العربية
"إثبات",
"نظرية",
"خطوة بخطوة",
"رياضي",
"منطقياً",
];
export const SIMPLE_KEYWORDS: readonly string[] = [
// English
"what is",
"define",
"translate",
"hello",
"yes or no",
"summarize",
"list",
"tell me",
"who is",
// Português (PT-BR)
"o que é",
"definir",
"traduzir",
"olá",
"oi",
"sim ou não",
"resumir",
"listar",
"me diga",
"quem é",
"quando foi",
"onde fica",
"explique brevemente",
"de forma simples",
// Español
"qué es",
"definir",
"traducir",
"hola",
"resumir",
"listar",
// 中文
"什么是",
"定义",
"翻译",
"你好",
"总结",
"列出",
// Русский
"что такое",
"определить",
"перевести",
"привет",
"резюмировать",
// Deutsch
"was ist",
"definieren",
"übersetzen",
"hallo",
"zusammenfassen",
// 한국어
"이란",
"정의",
"번역",
"안녕",
"요약",
// العربية
"ما هو",
"تعريف",
"ترجمة",
"مرحبا",
"ملخص",
];
/**
* Classify a prompt's intent using multilingual keyword matching.
* Priority: code > reasoning > simple > medium (default)
*/
export function classifyPromptIntent(prompt: string, systemPrompt?: string): IntentType {
const fullText = `${systemPrompt ?? ""} ${prompt}`.toLowerCase();
const wordCount = prompt.trim().split(/\s+/).length;
for (const kw of CODE_KEYWORDS) {
if (fullText.includes(kw.toLowerCase())) return "code";
}
for (const kw of REASONING_KEYWORDS) {
if (fullText.includes(kw.toLowerCase())) return "reasoning";
}
if (wordCount < 60) {
for (const kw of SIMPLE_KEYWORDS) {
if (fullText.includes(kw.toLowerCase())) return "simple";
}
}
return "medium";
}
export interface IntentClassifierConfig {
enabled: boolean;
extraCodeKeywords?: string[];
extraReasoningKeywords?: string[];
extraSimpleKeywords?: string[];
simpleMaxWords?: number;
}
export const DEFAULT_INTENT_CONFIG: IntentClassifierConfig = {
enabled: true,
simpleMaxWords: 60,
};
export function classifyWithConfig(
prompt: string,
config: IntentClassifierConfig,
systemPrompt?: string
): IntentType {
if (!config.enabled) return "medium";
const fullText = `${systemPrompt ?? ""} ${prompt}`.toLowerCase();
const wordCount = prompt.trim().split(/\s+/).length;
const maxSimpleWords = config.simpleMaxWords ?? 60;
const codeKws = [...CODE_KEYWORDS, ...(config.extraCodeKeywords ?? [])];
const reasoningKws = [...REASONING_KEYWORDS, ...(config.extraReasoningKeywords ?? [])];
const simpleKws = [...SIMPLE_KEYWORDS, ...(config.extraSimpleKeywords ?? [])];
for (const kw of codeKws) {
if (fullText.includes(kw.toLowerCase())) return "code";
}
for (const kw of reasoningKws) {
if (fullText.includes(kw.toLowerCase())) return "reasoning";
}
if (wordCount < maxSimpleWords) {
for (const kw of simpleKws) {
if (fullText.includes(kw.toLowerCase())) return "simple";
}
}
return "medium";
}
+12
View File
@@ -23,6 +23,18 @@ const PROVIDER_MODEL_ALIASES = {
"gemini-3-flash": "gemini-3-flash-preview",
"raptor-mini": "oswe-vscode-prime",
},
gemini: {
"gemini-3.1-pro-preview": "gemini-3.1-pro",
"gemini-3-1-pro": "gemini-3.1-pro",
},
"gemini-cli": {
"gemini-3.1-pro-preview": "gemini-3.1-pro",
"gemini-3-1-pro": "gemini-3.1-pro",
},
nvidia: {
"gpt-oss-120b": "openai/gpt-oss-120b",
"nvidia/gpt-oss-120b": "openai/gpt-oss-120b",
},
antigravity: {},
};
+50
View File
@@ -0,0 +1,50 @@
import { PROVIDER_ID_TO_ALIAS, PROVIDER_MODELS } from "../config/providerModels.ts";
import { parseModel } from "./model.ts";
// Conservative denylist fallback used when registry metadata is absent.
// Keep small and explicit to avoid false negatives.
const TOOL_CALLING_UNSUPPORTED_PATTERNS = [
"gpt-oss-120b",
"deepseek-reasoner",
"glm-4.7",
"glm4.7",
];
function getRegistryToolCallingFlag(providerIdOrAlias: string, modelId: string): boolean | null {
const providerAlias = PROVIDER_ID_TO_ALIAS[providerIdOrAlias] || providerIdOrAlias;
const models = PROVIDER_MODELS[providerAlias];
if (!Array.isArray(models)) return null;
const found = models.find((m) => m?.id === modelId);
if (!found) return null;
return typeof found.toolCalling === "boolean" ? found.toolCalling : null;
}
/**
* Returns whether a model should be considered safe for structured function/tool calling.
*
* Decision order:
* 1) Provider registry metadata (toolCalling flag) when available.
* 2) Conservative denylist fallback for known problematic model families.
* 3) Default true.
*/
export function supportsToolCalling(modelStr: string): boolean {
const parsed = parseModel(modelStr);
const provider = parsed.provider || parsed.providerAlias || "";
const model = parsed.model || modelStr;
if (provider) {
const fromRegistry = getRegistryToolCallingFlag(provider, model);
if (fromRegistry !== null) return fromRegistry;
}
const normalized = String(modelStr || "").toLowerCase();
if (!normalized) return false;
const blocked = TOOL_CALLING_UNSUPPORTED_PATTERNS.some((pattern) => {
if (normalized === pattern) return true;
if (normalized.endsWith(`/${pattern}`)) return true;
return normalized.includes(pattern);
});
return !blocked;
}
+120
View File
@@ -0,0 +1,120 @@
/**
* Request Deduplication Service
*
* Deduplicates **concurrent** identical requests to the same upstream.
* Inspired by ClawRouter's dedup.ts (BlockRunAI / github.com/BlockRunAI/ClawRouter).
*
* IMPORTANT: In-memory only does NOT persist across restarts and does NOT
* work across multiple process instances (no cross-instance dedup).
*/
import { createHash } from "node:crypto";
export interface DedupConfig {
enabled: boolean;
maxTemperatureForDedup: number;
timeoutMs: number;
}
export const DEFAULT_DEDUP_CONFIG: DedupConfig = {
enabled: true,
maxTemperatureForDedup: 0.1,
timeoutMs: 60_000,
};
export interface DedupResult<T> {
result: T;
wasDeduplicated: boolean;
hash: string;
}
const inflight = new Map<string, Promise<unknown>>();
/**
* Compute a deterministic hash for a request body.
* Includes: model, messages, temperature, tools, tool_choice, max_tokens, response_format
* Excludes: stream, user, metadata (don't affect LLM output)
*/
export function computeRequestHash(requestBody: unknown): string {
const body = requestBody as Record<string, unknown>;
const canonical = {
model: body.model ?? null,
messages: body.messages ?? null,
temperature: typeof body.temperature === "number" ? body.temperature : 1.0,
tools: body.tools ?? null,
tool_choice: body.tool_choice ?? null,
max_tokens: body.max_tokens ?? null,
response_format: body.response_format ?? null,
top_p: body.top_p ?? null,
frequency_penalty: body.frequency_penalty ?? null,
presence_penalty: body.presence_penalty ?? null,
};
return createHash("sha256").update(JSON.stringify(canonical)).digest("hex").slice(0, 16);
}
/** Determine whether a request should be deduplicated */
export function shouldDeduplicate(
requestBody: unknown,
config: DedupConfig = DEFAULT_DEDUP_CONFIG
): boolean {
if (!config.enabled) return false;
const body = requestBody as Record<string, unknown>;
if (body.stream === true) return false;
const temperature = typeof body.temperature === "number" ? body.temperature : 1.0;
if (temperature > config.maxTemperatureForDedup) return false;
return true;
}
/**
* Execute a request with deduplication.
* Concurrent identical requests share one upstream call.
*/
export async function deduplicate<T>(
hash: string,
fn: () => Promise<T>,
config: DedupConfig = DEFAULT_DEDUP_CONFIG
): Promise<DedupResult<T>> {
if (!config.enabled) {
return { result: await fn(), wasDeduplicated: false, hash };
}
const existing = inflight.get(hash);
if (existing) {
const result = (await existing) as T;
return { result, wasDeduplicated: true, hash };
}
let resolve!: (value: T) => void;
let reject!: (reason: unknown) => void;
const sharedPromise = new Promise<T>((res, rej) => {
resolve = res;
reject = rej;
});
inflight.set(hash, sharedPromise as Promise<unknown>);
const timer = setTimeout(() => {
if (inflight.get(hash) === sharedPromise) inflight.delete(hash);
}, config.timeoutMs);
try {
const result = await fn();
resolve(result);
return { result, wasDeduplicated: false, hash };
} catch (err) {
reject(err);
throw err;
} finally {
clearTimeout(timer);
if (inflight.get(hash) === sharedPromise) inflight.delete(hash);
}
}
export function getInflightCount(): number {
return inflight.size;
}
export function getInflightHashes(): string[] {
return [...inflight.keys()];
}
export function clearInflight(): void {
inflight.clear();
}
+142
View File
@@ -0,0 +1,142 @@
/**
* Search Cache in-memory TTL cache with request coalescing
*
* Bounded at MAX_CACHE_ENTRIES to prevent OOM.
* Request coalescing deduplicates concurrent identical queries
* to prevent cache stampede (critical for agentic tools).
*/
import { createHash } from "crypto";
const MAX_CACHE_ENTRIES = 5000;
const DEFAULT_TTL_MS = parseInt(process.env.SEARCH_CACHE_TTL_MS || String(5 * 60 * 1000), 10);
interface CacheEntry<T> {
data: T;
expiresAt: number;
}
const cache = new Map<string, CacheEntry<unknown>>();
const inflight = new Map<string, Promise<unknown>>();
let hits = 0;
let misses = 0;
/**
* Normalize a query for cache key computation.
* NFKC normalization, lowercase, trim, collapse whitespace.
*/
function normalizeQuery(query: string): string {
return query.normalize("NFKC").toLowerCase().trim().replace(/\s+/g, " ");
}
/**
* Compute a deterministic cache key from search parameters.
*/
export function computeCacheKey(
query: string,
provider: string,
searchType: string,
maxResults: number,
country?: string,
language?: string,
filters?: unknown
): string {
const normalized = normalizeQuery(query);
const payload = JSON.stringify({
q: normalized,
p: provider,
t: searchType,
n: maxResults,
c: country || null,
l: language || null,
f: filters || null,
});
return createHash("sha256").update(payload).digest("hex");
}
/**
* Evict expired entries and enforce size bound.
* Called lazily on writes. O(n) worst case but amortized O(1).
*/
function evictIfNeeded(): void {
const now = Date.now();
// Remove expired entries first
for (const [key, entry] of cache) {
if (entry.expiresAt <= now) {
cache.delete(key);
}
}
// FIFO eviction if still over limit
while (cache.size >= MAX_CACHE_ENTRIES) {
const firstKey = cache.keys().next().value;
if (firstKey !== undefined) {
cache.delete(firstKey);
} else {
break;
}
}
}
/**
* Get or coalesce: return cached data, join an inflight request,
* or execute the fetch function and cache the result.
*
* @param key - Cache key from computeCacheKey()
* @param ttlMs - TTL in milliseconds (0 to bypass cache)
* @param fetchFn - Function to execute on cache miss
* @returns The cached or freshly fetched data
*/
export async function getOrCoalesce<T>(
key: string,
ttlMs: number,
fetchFn: () => Promise<T>
): Promise<{ data: T; cached: boolean }> {
// 1. Check cache
const cached = cache.get(key) as CacheEntry<T> | undefined;
if (cached && cached.expiresAt > Date.now()) {
hits++;
return { data: cached.data, cached: true };
}
// 2. Join inflight request if one exists (request coalescing)
const existing = inflight.get(key) as Promise<T> | undefined;
if (existing) {
hits++;
const data = await existing;
return { data, cached: true };
}
// 3. Cache miss — execute fetch
misses++;
const promise = fetchFn();
inflight.set(key, promise);
try {
const data = await promise;
// Store in cache
if (ttlMs > 0) {
evictIfNeeded();
cache.set(key, { data, expiresAt: Date.now() + ttlMs });
}
return { data, cached: false };
} finally {
inflight.delete(key);
}
}
/**
* Get cache statistics for monitoring.
*/
export function getCacheStats(): { size: number; hits: number; misses: number } {
return { size: cache.size, hits, misses };
}
/**
* Default TTL for search cache entries.
*/
export const SEARCH_CACHE_DEFAULT_TTL_MS = DEFAULT_TTL_MS;
+166 -39
View File
@@ -75,6 +75,30 @@ function getFieldValue(source: unknown, snakeKey: string, camelKey: string): unk
return obj[snakeKey] ?? obj[camelKey] ?? null;
}
function clampPercentage(value: number): number {
return Math.max(0, Math.min(100, value));
}
function toDisplayLabel(value: string): string {
return value
.replace(/^copilot[_\s-]*/i, "")
.split(/[\s_-]+/)
.filter(Boolean)
.map((part) => {
if (/^pro\+$/i.test(part)) return "Pro+";
if (/^[a-z]{2,}$/.test(part)) return part.charAt(0).toUpperCase() + part.slice(1).toLowerCase();
return part;
})
.join(" ")
.trim();
}
function shouldDisplayGitHubQuota(quota: UsageQuota | null): quota is UsageQuota {
if (!quota) return false;
if (quota.unlimited && quota.total <= 0) return false;
return quota.total > 0 || quota.remainingPercentage !== undefined;
}
/**
* Get usage data for a provider connection
* @param {Object} connection - Provider connection with accessToken
@@ -170,48 +194,65 @@ async function getGitHubUsage(accessToken, providerSpecificData) {
}
const data = await response.json();
const dataRecord = toRecord(data);
// Handle different response formats (paid vs free)
if (data.quota_snapshots) {
if (dataRecord.quota_snapshots) {
// Paid plan format
const snapshots = data.quota_snapshots;
const resetAt = parseResetTime(data.quota_reset_date);
const snapshots = toRecord(dataRecord.quota_snapshots);
const resetAt = parseResetTime(getFieldValue(dataRecord, "quota_reset_date", "quotaResetDate"));
const premiumQuota = formatGitHubQuotaSnapshot(snapshots.premium_interactions, resetAt);
const chatQuota = formatGitHubQuotaSnapshot(snapshots.chat, resetAt);
const completionsQuota = formatGitHubQuotaSnapshot(snapshots.completions, resetAt);
const quotas: Record<string, UsageQuota> = {};
if (shouldDisplayGitHubQuota(premiumQuota)) {
quotas.premium_interactions = premiumQuota;
}
if (shouldDisplayGitHubQuota(chatQuota)) {
quotas.chat = chatQuota;
}
if (shouldDisplayGitHubQuota(completionsQuota)) {
quotas.completions = completionsQuota;
}
return {
plan: data.copilot_plan,
resetDate: data.quota_reset_date,
quotas: {
chat: { ...formatGitHubQuotaSnapshot(snapshots.chat), resetAt },
completions: { ...formatGitHubQuotaSnapshot(snapshots.completions), resetAt },
premium_interactions: {
...formatGitHubQuotaSnapshot(snapshots.premium_interactions),
resetAt,
},
},
plan: inferGitHubPlanName(dataRecord, premiumQuota),
resetDate: getFieldValue(dataRecord, "quota_reset_date", "quotaResetDate"),
quotas,
};
} else if (data.monthly_quotas || data.limited_user_quotas) {
} else if (dataRecord.monthly_quotas || dataRecord.limited_user_quotas) {
// Free/limited plan format
const monthlyQuotas = data.monthly_quotas || {};
const usedQuotas = data.limited_user_quotas || {};
const resetAt = parseResetTime(data.limited_user_reset_date);
const monthlyQuotas = toRecord(dataRecord.monthly_quotas);
const usedQuotas = toRecord(dataRecord.limited_user_quotas);
const resetDate = getFieldValue(dataRecord, "limited_user_reset_date", "limitedUserResetDate");
const resetAt = parseResetTime(resetDate);
const quotas: Record<string, UsageQuota> = {};
const addLimitedQuota = (name: string) => {
const total = toNumber(getFieldValue(monthlyQuotas, name, name), 0);
const used = Math.max(0, toNumber(getFieldValue(usedQuotas, name, name), 0));
if (total <= 0) return null;
const clampedUsed = Math.min(used, total);
quotas[name] = {
used: clampedUsed,
total,
remaining: Math.max(total - clampedUsed, 0),
remainingPercentage: clampPercentage(((total - clampedUsed) / total) * 100),
unlimited: false,
resetAt,
};
return quotas[name];
};
const premiumQuota = addLimitedQuota("premium_interactions");
addLimitedQuota("chat");
addLimitedQuota("completions");
return {
plan: data.copilot_plan || data.access_type_sku,
resetDate: data.limited_user_reset_date,
quotas: {
chat: {
used: usedQuotas.chat || 0,
total: monthlyQuotas.chat || 0,
unlimited: false,
resetAt,
},
completions: {
used: usedQuotas.completions || 0,
total: monthlyQuotas.completions || 0,
unlimited: false,
resetAt,
},
},
plan: inferGitHubPlanName(dataRecord, premiumQuota),
resetDate,
quotas,
};
}
@@ -221,17 +262,103 @@ async function getGitHubUsage(accessToken, providerSpecificData) {
}
}
function formatGitHubQuotaSnapshot(quota) {
if (!quota) return { used: 0, total: 0, unlimited: true };
function formatGitHubQuotaSnapshot(quota, resetAt: string | null = null): UsageQuota | null {
const source = toRecord(quota);
if (Object.keys(source).length === 0) return null;
const unlimited = source.unlimited === true;
const entitlement = toNumber(source.entitlement, Number.NaN);
const totalValue = toNumber(source.total, Number.NaN);
const remainingValue = toNumber(source.remaining, Number.NaN);
const usedValue = toNumber(source.used, Number.NaN);
const percentRemainingValue = toNumber(
getFieldValue(source, "percent_remaining", "percentRemaining"),
Number.NaN
);
let total = Number.isFinite(totalValue)
? Math.max(0, totalValue)
: Number.isFinite(entitlement)
? Math.max(0, entitlement)
: 0;
let remaining = Number.isFinite(remainingValue) ? Math.max(0, remainingValue) : undefined;
let used = Number.isFinite(usedValue) ? Math.max(0, usedValue) : undefined;
let remainingPercentage = Number.isFinite(percentRemainingValue)
? clampPercentage(percentRemainingValue)
: undefined;
if (used === undefined && total > 0 && remaining !== undefined) {
used = Math.max(total - remaining, 0);
}
if (remaining === undefined && total > 0 && used !== undefined) {
remaining = Math.max(total - used, 0);
}
if (remainingPercentage === undefined && total > 0 && remaining !== undefined) {
remainingPercentage = clampPercentage((remaining / total) * 100);
}
if (total <= 0 && remainingPercentage !== undefined) {
total = 100;
used = 100 - remainingPercentage;
remaining = remainingPercentage;
}
return {
used: quota.entitlement - quota.remaining,
total: quota.entitlement,
remaining: quota.remaining,
unlimited: quota.unlimited || false,
used: Math.max(0, used ?? 0),
total,
remaining,
remainingPercentage,
resetAt,
unlimited,
};
}
function inferGitHubPlanName(data: JsonRecord, premiumQuota: UsageQuota | null): string {
const rawPlan = getFieldValue(data, "copilot_plan", "copilotPlan");
const rawSku = getFieldValue(data, "access_type_sku", "accessTypeSku");
const planText = typeof rawPlan === "string" ? rawPlan.trim() : "";
const skuText = typeof rawSku === "string" ? rawSku.trim() : "";
const combined = `${skuText} ${planText}`.trim().toUpperCase();
const monthlyQuotas = toRecord(getFieldValue(data, "monthly_quotas", "monthlyQuotas"));
const premiumTotal =
premiumQuota?.total ||
toNumber(getFieldValue(monthlyQuotas, "premium_interactions", "premiumInteractions"), 0);
const chatTotal = toNumber(getFieldValue(monthlyQuotas, "chat", "chat"), 0);
if (
combined.includes("PRO+") ||
combined.includes("PRO_PLUS") ||
combined.includes("PROPLUS")
) {
return "Copilot Pro+";
}
if (combined.includes("ENTERPRISE")) return "Copilot Enterprise";
if (combined.includes("BUSINESS")) return "Copilot Business";
if (combined.includes("STUDENT")) return "Copilot Student";
if (combined.includes("FREE")) return "Copilot Free";
if (combined.includes("PRO")) return "Copilot Pro";
if (premiumTotal >= 1400) return "Copilot Pro+";
if (premiumTotal >= 900) return "Copilot Enterprise";
if (premiumTotal >= 250) {
if (combined.includes("INDIVIDUAL")) return "Copilot Pro";
return "Copilot Business";
}
if (premiumTotal > 0 || chatTotal === 50) return "Copilot Free";
if (skuText) {
const label = toDisplayLabel(skuText);
return label ? `Copilot ${label}` : "GitHub Copilot";
}
if (planText) {
const label = toDisplayLabel(planText);
return label ? `Copilot ${label}` : "GitHub Copilot";
}
return "GitHub Copilot";
}
/**
* Gemini CLI Usage (Google Cloud)
*/
@@ -91,6 +91,10 @@ export function filterToOpenAIFormat(body) {
delete body.tools;
}
// Strip Claude-specific fields that OpenAI-compatible providers reject
delete body.metadata;
delete body.anthropic_version;
// Normalize tools to OpenAI format (from Claude, Gemini, etc.)
if (body.tools && Array.isArray(body.tools) && body.tools.length > 0) {
body.tools = body.tools
+58 -15
View File
@@ -1,26 +1,69 @@
// Tool call helper functions for translator
// Generate unique tool call ID
const ALPHANUM9 = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
// Generate unique tool call ID (default long form)
export function generateToolCallId() {
return `call_${Date.now().toString(36)}_${Math.random().toString(36).slice(2, 9)}`;
}
// Ensure all tool_calls have id field and arguments is string (some providers require it)
export function ensureToolCallIds(body) {
// Generate 9-char [a-zA-Z0-9] id for providers that require it (e.g. Mistral)
function generateToolCallId9(): string {
let s = "";
for (let i = 0; i < 9; i++) {
s += ALPHANUM9[Math.floor(Math.random() * ALPHANUM9.length)];
}
return s;
}
/** @param options.use9CharId - When true, normalize ids to 9-char [a-zA-Z0-9] (e.g. Mistral); when false, only fix type/arguments, leave ids as-is */
export function ensureToolCallIds(body, options?: { use9CharId?: boolean }) {
if (!body.messages || !Array.isArray(body.messages)) return body;
for (const msg of body.messages) {
if (msg.role === "assistant" && msg.tool_calls && Array.isArray(msg.tool_calls)) {
for (const tc of msg.tool_calls) {
if (!tc.id) {
tc.id = generateToolCallId();
}
if (!tc.type) {
tc.type = "function";
}
// Ensure arguments is JSON string, not object
if (tc.function?.arguments && typeof tc.function.arguments !== "string") {
tc.function.arguments = JSON.stringify(tc.function.arguments);
const use9CharId = options?.use9CharId === true;
for (let i = 0; i < body.messages.length; i++) {
const msg = body.messages[i];
if (msg.role !== "assistant" || !msg.tool_calls || !Array.isArray(msg.tool_calls)) continue;
const used9 = new Set<string>();
const newIdsInOrder: string[] = [];
for (const tc of msg.tool_calls) {
if (!tc.type) {
tc.type = "function";
}
if (tc.function?.arguments && typeof tc.function.arguments !== "string") {
tc.function.arguments = JSON.stringify(tc.function.arguments);
}
if (use9CharId) {
let newId: string;
do {
newId = generateToolCallId9();
} while (used9.has(newId));
used9.add(newId);
newIdsInOrder.push(newId);
tc.id = newId;
} else {
// Leave id as-is, only ensure it exists for later tool message matching
const id =
tc.id != null && String(tc.id).trim() !== "" ? String(tc.id) : generateToolCallId();
tc.id = id;
newIdsInOrder.push(id);
}
}
// Tool responses (role "tool") follow in same order as tool_calls; set tool_call_id by index.
// Stop when we hit another assistant so we only link tool messages that immediately follow this one.
if (newIdsInOrder.length > 0) {
let idx = 0;
for (let j = i + 1; j < body.messages.length; j++) {
const later = body.messages[j];
if (later.role === "assistant") break;
if (later.role !== "tool") continue;
if (idx < newIdsInOrder.length) {
later.tool_call_id = newIdsInOrder[idx];
idx++;
}
}
}
+11 -4
View File
@@ -66,6 +66,7 @@ function normalizeOpenAIResponsesRequest(body) {
return normalized;
}
/** @param options.normalizeToolCallId - When true, use 9-char tool call ids (e.g. Mistral); when false, leave ids as-is */
// Translate request: source -> openai -> target
export function translateRequest(
sourceFormat,
@@ -75,9 +76,11 @@ export function translateRequest(
stream = true,
credentials = null,
provider = null,
reqLogger = null
reqLogger = null,
options?: { normalizeToolCallId?: boolean }
) {
let result = body;
const use9CharId = options?.normalizeToolCallId === true;
// Phase 2: Apply thinking budget control before normalization
result = applyThinkingBudget(result);
@@ -85,8 +88,8 @@ export function translateRequest(
// Normalize thinking config: remove if lastMessage is not user
normalizeThinkingConfig(result);
// Always ensure tool_calls have id (some providers require it)
ensureToolCallIds(result);
// Ensure tool_calls have id; optionally normalize to 9-char for providers like Mistral
ensureToolCallIds(result, { use9CharId });
// Fix missing tool responses (insert empty tool_result if needed)
fixMissingToolResponses(result);
@@ -131,7 +134,7 @@ export function translateRequest(
}
// Final step: prepare request for Claude format endpoints
if (targetFormat === FORMATS.CLAUDE) {
if (targetFormat === FORMATS.CLAUDE && sourceFormat !== FORMATS.CLAUDE) {
result = prepareClaudeRequest(result, provider);
}
@@ -140,6 +143,10 @@ export function translateRequest(
result = normalizeOpenAIResponsesRequest(result);
}
// Ensure unique tool_call ids on final payload (translators may have introduced duplicates)
ensureToolCallIds(result, { use9CharId });
fixMissingToolResponses(result);
return result;
}
@@ -6,6 +6,7 @@
*/
import { register } from "../registry.ts";
import { FORMATS } from "../formats.ts";
import { generateToolCallId } from "../helpers/toolCallHelper.ts";
type JsonRecord = Record<string, unknown>;
@@ -120,6 +121,12 @@ export function openaiResponsesToOpenAIRequest(
}
if (itemType === "function_call") {
// Skip tool calls with empty names to avoid infinite placeholder_tool loops
const fnName = toString(item.name).trim();
if (!fnName) {
continue;
}
// Start or append assistant message with tool_calls
if (!currentAssistantMsg) {
currentAssistantMsg = {
@@ -136,7 +143,7 @@ export function openaiResponsesToOpenAIRequest(
id: toString(item.call_id),
type: "function",
function: {
name: toString(item.name),
name: fnName,
arguments: item.arguments,
},
});
@@ -201,6 +208,24 @@ export function openaiResponsesToOpenAIRequest(
});
}
// Filter orphaned tool results (no matching tool_call in assistant messages)
const allToolCallIds = new Set<string>();
for (const m of messages) {
const rec = toRecord(m);
if (Array.isArray(rec.tool_calls)) {
for (const tc of rec.tool_calls as { id?: string }[]) {
if (tc.id) allToolCallIds.add(String(tc.id));
}
}
}
result.messages = messages.filter((m) => {
const rec = toRecord(m);
if (rec.role === "tool" && rec.tool_call_id) {
return allToolCallIds.has(String(rec.tool_call_id));
}
return true;
});
// Cleanup Responses API specific fields
delete result.input;
delete result.instructions;
@@ -319,10 +344,15 @@ export function openaiToOpenAIResponsesRequest(
for (const toolCallValue of msg.tool_calls) {
const toolCall = toRecord(toolCallValue);
const fn = toRecord(toolCall.function);
// Skip tool calls with empty names to avoid infinite placeholder_tool loops
const fnName = toString(fn.name).trim();
if (!fnName) {
continue;
}
input.push({
type: "function_call",
call_id: toString(toolCall.id),
name: toString(fn.name),
call_id: toString(toolCall.id).trim() || generateToolCallId(),
name: fnName,
arguments: toString(fn.arguments, "{}"),
});
}
@@ -339,6 +369,22 @@ export function openaiToOpenAIResponsesRequest(
}
}
// Filter orphaned function_call_output items (no matching function_call)
// This happens when Claude Code compaction removes messages but leaves tool results
const knownCallIds = new Set(
input
.filter(
(item: { type?: string; call_id?: string }) => item.type === "function_call" && item.call_id
)
.map((item: { type?: string; call_id?: string }) => item.call_id)
);
result.input = input.filter((item: { type?: string; call_id?: string }) => {
if (item.type === "function_call_output" && item.call_id) {
return knownCallIds.has(item.call_id);
}
return true;
});
// If no system message, keep empty instructions
if (!hasSystemMessage) {
result.instructions = "";
@@ -123,6 +123,43 @@ export function openaiToClaudeRequest(model, body, stream) {
flushCurrentMessage();
// Remove assistant messages with empty content (can happen when all tool_use blocks were skipped)
result.messages = result.messages.filter((msg) => {
if (msg.role === "assistant" && Array.isArray(msg.content) && msg.content.length === 0) {
return false;
}
return true;
});
// Filter orphaned tool_result blocks whose tool_use_id has no matching tool_use
const allToolUseIds = new Set<string>();
for (const msg of result.messages) {
if (msg.role === "assistant" && Array.isArray(msg.content)) {
for (const block of msg.content) {
if (block.type === "tool_use" && block.id) {
allToolUseIds.add(String(block.id));
}
}
}
}
for (const msg of result.messages) {
if (msg.role === "user" && Array.isArray(msg.content)) {
msg.content = msg.content.filter((block) => {
if (block.type === "tool_result" && block.tool_use_id) {
return allToolUseIds.has(String(block.tool_use_id));
}
return true;
});
}
}
// Remove user messages that became empty after orphan filtering
result.messages = result.messages.filter((msg) => {
if (msg.role === "user" && Array.isArray(msg.content) && msg.content.length === 0) {
return false;
}
return true;
});
// Add cache_control to last assistant message
for (let i = result.messages.length - 1; i >= 0; i--) {
const message = result.messages[i];
+179 -15
View File
@@ -30,6 +30,8 @@ type StreamLogger = {
type StreamCompletePayload = {
status: number;
usage: unknown;
/** Minimal response body for call log (streaming: usage + note; non-streaming not used) */
responseBody?: unknown;
};
type StreamOptions = {
@@ -51,6 +53,8 @@ type TranslateState = ReturnType<typeof initState> & {
toolNameMap?: unknown;
usage?: unknown;
finishReason?: unknown;
/** Accumulated message content for call log response body */
accumulatedContent?: string;
};
function getOpenAIIntermediateChunks(value: unknown): unknown[] {
@@ -106,14 +110,21 @@ export function createSSEStream(options: StreamOptions = {}) {
let buffer = "";
let usage = null;
// State for translate mode
// State for translate mode (accumulatedContent for call log response body)
const state: TranslateState | null =
mode === STREAM_MODE.TRANSLATE
? { ...(initState(sourceFormat) as TranslateState), provider, toolNameMap }
? {
...(initState(sourceFormat) as TranslateState),
provider,
toolNameMap,
accumulatedContent: "",
}
: null;
// Track content length for usage estimation (both modes)
let totalContentLength = 0;
// Passthrough: accumulate content for call log response body
let passthroughAccumulatedContent = "";
// Guard against duplicate [DONE] events — ensures exactly one per stream
let doneSent = false;
@@ -184,15 +195,52 @@ export function createSSEStream(options: StreamOptions = {}) {
typeof parsed.type === "string" &&
parsed.type.startsWith("response.");
// Detect Claude SSE payloads. Includes "ping" and "error" to ensure
// they bypass the Chat Completions sanitization path which would
// incorrectly process or drop them.
const isClaudeSSE =
parsed.type &&
typeof parsed.type === "string" &&
(parsed.type.startsWith("message") ||
parsed.type.startsWith("content_block") ||
parsed.type === "ping" ||
parsed.type === "error");
if (isResponsesSSE) {
// Responses SSE: only extract usage, forward payload as-is
const extracted = extractUsage(parsed);
if (extracted) {
usage = extracted;
}
// Track content length from Responses format
// Track content length and accumulate for call log
if (parsed.delta && typeof parsed.delta === "string") {
totalContentLength += parsed.delta.length;
passthroughAccumulatedContent += parsed.delta;
}
} else if (isClaudeSSE) {
// Claude SSE: extract usage, track content, forward as-is
const extracted = extractUsage(parsed);
if (extracted) {
// Non-destructive merge: never overwrite a positive value with 0
// message_start carries input_tokens, message_delta carries output_tokens
if (!usage) usage = {};
if (extracted.prompt_tokens > 0) usage.prompt_tokens = extracted.prompt_tokens;
if (extracted.completion_tokens > 0)
usage.completion_tokens = extracted.completion_tokens;
if (extracted.total_tokens > 0) usage.total_tokens = extracted.total_tokens;
if (extracted.cache_read_input_tokens)
usage.cache_read_input_tokens = extracted.cache_read_input_tokens;
if (extracted.cache_creation_input_tokens)
usage.cache_creation_input_tokens = extracted.cache_creation_input_tokens;
}
// Track content length and accumulate from Claude format
if (parsed.delta?.text) {
totalContentLength += parsed.delta.text.length;
passthroughAccumulatedContent += parsed.delta.text;
}
if (parsed.delta?.thinking) {
totalContentLength += parsed.delta.thinking.length;
passthroughAccumulatedContent += parsed.delta.thinking;
}
} else {
// Chat Completions: full sanitization pipeline
@@ -219,6 +267,10 @@ export function createSSEStream(options: StreamOptions = {}) {
if (content && typeof content === "string") {
totalContentLength += content.length;
}
if (typeof delta?.content === "string")
passthroughAccumulatedContent += delta.content;
if (typeof delta?.reasoning_content === "string")
passthroughAccumulatedContent += delta.reasoning_content;
const extracted = extractUsage(parsed);
if (extracted) {
@@ -274,23 +326,45 @@ export function createSSEStream(options: StreamOptions = {}) {
continue;
}
// Track content length for estimation (from various formats)
// Include both regular content and reasoning/thinking content
// Track content length and accumulate for call log (from raw provider chunk, so content is never missed)
// Do this before translation so we capture content regardless of translator output shape
// Claude format
if (parsed.delta?.text) {
totalContentLength += parsed.delta.text.length;
const t = parsed.delta.text;
totalContentLength += t.length;
if (state?.accumulatedContent !== undefined && typeof t === "string")
state.accumulatedContent += t;
}
if (parsed.delta?.thinking) {
totalContentLength += parsed.delta.thinking.length;
const t = parsed.delta.thinking;
totalContentLength += t.length;
if (state?.accumulatedContent !== undefined && typeof t === "string")
state.accumulatedContent += t;
}
// OpenAI format
if (parsed.choices?.[0]?.delta?.content) {
totalContentLength += parsed.choices[0].delta.content.length;
const c = parsed.choices[0].delta.content;
if (typeof c === "string") {
totalContentLength += c.length;
if (state?.accumulatedContent !== undefined) state.accumulatedContent += c;
} else if (Array.isArray(c)) {
for (const part of c) {
if (part?.text && typeof part.text === "string") {
totalContentLength += part.text.length;
if (state?.accumulatedContent !== undefined)
state.accumulatedContent += part.text;
}
}
}
}
if (parsed.choices?.[0]?.delta?.reasoning_content) {
totalContentLength += parsed.choices[0].delta.reasoning_content.length;
const r = parsed.choices[0].delta.reasoning_content;
if (typeof r === "string") {
totalContentLength += r.length;
if (state?.accumulatedContent !== undefined) state.accumulatedContent += r;
}
}
// Gemini format - may have multiple parts
@@ -298,10 +372,30 @@ export function createSSEStream(options: StreamOptions = {}) {
for (const part of parsed.candidates[0].content.parts) {
if (part.text && typeof part.text === "string") {
totalContentLength += part.text.length;
if (state?.accumulatedContent !== undefined) state.accumulatedContent += part.text;
}
}
}
// Generic fallback: delta string, top-level content/text (e.g. some SSE payloads)
if (state?.accumulatedContent !== undefined) {
if (typeof (parsed as JsonRecord).delta === "string") {
const d = (parsed as JsonRecord).delta as string;
state.accumulatedContent += d;
totalContentLength += d.length;
}
if (typeof (parsed as JsonRecord).content === "string") {
const c = (parsed as JsonRecord).content as string;
state.accumulatedContent += c;
totalContentLength += c.length;
}
if (typeof (parsed as JsonRecord).text === "string") {
const t = (parsed as JsonRecord).text as string;
state.accumulatedContent += t;
totalContentLength += t.length;
}
}
// Extract usage
const extracted = extractUsage(parsed);
if (extracted) state.usage = extracted; // Keep original usage for logging
@@ -317,6 +411,9 @@ export function createSSEStream(options: StreamOptions = {}) {
if (translated?.length > 0) {
for (const item of translated) {
// Content for call log is accumulated only from parsed (above) to avoid double-counting;
// do not add again from item here.
// Filter empty chunks
if (!hasValuableContent(item, sourceFormat)) {
continue; // Skip this empty chunk
@@ -372,9 +469,9 @@ export function createSSEStream(options: StreamOptions = {}) {
controller.enqueue(encoder.encode(output));
}
// Estimate usage if provider didn't return valid usage (PASSTHROUGH is always OpenAI format)
// Estimate usage if provider didn't return valid usage
if (!hasValidUsage(usage) && totalContentLength > 0) {
usage = estimateUsage(body, totalContentLength, FORMATS.OPENAI);
usage = estimateUsage(body, totalContentLength, sourceFormat || FORMATS.OPENAI);
}
if (hasValidUsage(usage)) {
@@ -388,10 +485,30 @@ export function createSSEStream(options: StreamOptions = {}) {
status: "200 OK",
}).catch(() => {});
}
// Notify caller for call log persistence
// Notify caller for call log persistence (include full response body with accumulated content)
if (onComplete) {
try {
onComplete({ status: 200, usage });
const u = usage as Record<string, unknown> | null;
const prompt = Number(u?.prompt_tokens ?? u?.input_tokens ?? 0);
const completion = Number(u?.completion_tokens ?? u?.output_tokens ?? 0);
const content = passthroughAccumulatedContent.trim() || "";
const responseBody = {
choices: [
{
message: {
role: "assistant",
content,
},
},
],
usage: {
prompt_tokens: prompt,
completion_tokens: completion,
total_tokens: prompt + completion,
},
_streamed: true,
};
onComplete({ status: 200, usage, responseBody });
} catch {}
}
return;
@@ -401,6 +518,33 @@ export function createSSEStream(options: StreamOptions = {}) {
if (buffer.trim()) {
const parsed = parseSSELine(buffer.trim());
if (parsed && !parsed.done) {
// Extract usage from remaining buffer — if the usage-bearing event
// (e.g. response.completed) is the last SSE line, it ends up here
// in the flush handler where extractUsage was not called.
// Non-destructive merge: some providers send usage across multiple
// events (e.g. prompt_tokens in message_start, completion_tokens
// in message_delta). Direct assignment would lose earlier data.
const extracted = extractUsage(parsed);
if (extracted) {
if (!state.usage) {
state.usage = extracted;
} else {
if (extracted.prompt_tokens > 0)
state.usage.prompt_tokens = extracted.prompt_tokens;
if (extracted.completion_tokens > 0)
state.usage.completion_tokens = extracted.completion_tokens;
if (extracted.total_tokens > 0) state.usage.total_tokens = extracted.total_tokens;
if (extracted.cache_read_input_tokens > 0)
state.usage.cache_read_input_tokens = extracted.cache_read_input_tokens;
if (extracted.cache_creation_input_tokens > 0)
state.usage.cache_creation_input_tokens = extracted.cache_creation_input_tokens;
if (extracted.cached_tokens > 0)
state.usage.cached_tokens = extracted.cached_tokens;
if (extracted.reasoning_tokens > 0)
state.usage.reasoning_tokens = extracted.reasoning_tokens;
}
}
const translated = translateResponse(targetFormat, sourceFormat, parsed, state);
// Log OpenAI intermediate chunks
@@ -470,10 +614,30 @@ export function createSSEStream(options: StreamOptions = {}) {
status: "200 OK",
}).catch(() => {});
}
// Notify caller for call log persistence
// Notify caller for call log persistence (include full response body with accumulated content)
if (onComplete) {
try {
onComplete({ status: 200, usage: state?.usage });
const u = state?.usage as Record<string, unknown> | null | undefined;
const prompt = Number(u?.prompt_tokens ?? u?.input_tokens ?? 0);
const completion = Number(u?.completion_tokens ?? u?.output_tokens ?? 0);
const content = (state?.accumulatedContent ?? "").trim() || "";
const responseBody = {
choices: [
{
message: {
role: "assistant",
content,
},
},
],
usage: {
prompt_tokens: prompt,
completion_tokens: completion,
total_tokens: prompt + completion,
},
_streamed: true,
};
onComplete({ status: 200, usage: state?.usage, responseBody });
} catch {}
}
} catch (error) {
+3 -1
View File
@@ -400,8 +400,10 @@ export function logUsage(provider, usage, model = null, connectionId = null, api
console.log(msg);
// Save to usage DB
// input = total input tokens (non-cached + cache_read + cache_creation)
// This ensures analytics show correct totals for heavily-cached requests
const tokens = {
input: inTokens,
input: inTokens + (cacheRead || 0) + (cacheCreation || 0),
output: outTokens,
cacheRead: cacheRead || 0,
cacheCreation: cacheCreation || 0,
+658 -455
View File
File diff suppressed because it is too large Load Diff
+1 -1
View File
@@ -1,6 +1,6 @@
{
"name": "omniroute",
"version": "2.6.6",
"version": "2.8.2",
"description": "Smart AI Router with auto fallback — route to FREE & cheap models, zero downtime. Works with Cursor, Cline, Claude Desktop, Codex, and any OpenAI-compatible tool.",
"type": "module",
"bin": {
Binary file not shown.

After

Width:  |  Height:  |  Size: 4.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.2 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.2 KiB

+1
View File
@@ -0,0 +1 @@
<svg width="56" height="64" viewBox="0 0 56 64" fill="none" xmlns="http://www.w3.org/2000/svg"><path fill-rule="evenodd" clip-rule="evenodd" d="M53.292 15.321l1.5-3.676s-1.909-2.043-4.227-4.358c-2.317-2.315-7.225-.953-7.225-.953L37.751 0H18.12l-5.589 6.334s-4.908-1.362-7.225.953C2.988 9.602 1.08 11.645 1.08 11.645l1.5 3.676-1.91 5.447s5.614 21.236 6.272 23.83c1.295 5.106 2.181 7.08 5.862 9.668 3.68 2.587 10.36 7.08 11.45 7.762 1.091.68 2.455 1.84 3.682 1.84 1.227 0 2.59-1.16 3.68-1.84 1.091-.681 7.77-5.175 11.452-7.762 3.68-2.587 4.567-4.562 5.862-9.668.657-2.594 6.27-23.83 6.27-23.83l-1.908-5.447z" fill="url(#paint0_linear)"/><path fill-rule="evenodd" clip-rule="evenodd" d="M34.888 11.508c.818 0 6.885-1.157 6.885-1.157s7.189 8.68 7.189 10.536c0 1.534-.619 2.134-1.347 2.842-.152.148-.31.3-.467.468l-5.39 5.717a9.42 9.42 0 01-.176.18c-.538.54-1.33 1.336-.772 2.658l.115.269c.613 1.432 1.37 3.2.407 4.99-1.025 1.906-2.78 3.178-3.905 2.967-1.124-.21-3.766-1.589-4.737-2.218-.971-.63-4.05-3.166-4.05-4.137 0-.809 2.214-2.155 3.29-2.81.214-.13.383-.232.48-.298.111-.075.297-.19.526-.332.981-.61 2.754-1.71 2.799-2.197.055-.602.034-.778-.758-2.264-.168-.316-.365-.654-.568-1.004-.754-1.295-1.598-2.745-1.41-3.784.21-1.173 2.05-1.845 3.608-2.415.194-.07.385-.14.567-.209l1.623-.609c1.556-.582 3.284-1.229 3.57-1.36.394-.181.292-.355-.903-.468a54.655 54.655 0 01-.58-.06c-1.48-.157-4.209-.446-5.535-.077-.261.073-.553.152-.86.235-1.49.403-3.317.897-3.493 1.182-.03.05-.06.093-.089.133-.168.238-.277.394-.091 1.406.055.302.169.895.31 1.629.41 2.148 1.053 5.498 1.134 6.25.011.106.024.207.036.305.103.84.171 1.399-.805 1.622l-.255.058c-1.102.252-2.717.623-3.3.623-.584 0-2.2-.37-3.302-.623l-.254-.058c-.976-.223-.907-.782-.804-1.622.012-.098.024-.2.035-.305.081-.753.725-4.112 1.137-6.259.14-.73.253-1.32.308-1.62.185-1.012.076-1.168-.092-1.406a3.743 3.743 0 01-.09-.133c-.174-.285-2-.779-3.491-1.182-.307-.083-.6-.162-.86-.235-1.327-.37-4.055-.08-5.535.077-.226.024-.422.045-.58.06-1.196.113-1.297.287-.903.468.285.131 2.013.778 3.568 1.36.597.223 1.17.437 1.624.609.183.069.373.138.568.21 1.558.57 3.398 1.241 3.608 2.414.187 1.039-.657 2.489-1.41 3.784-.204.35-.4.688-.569 1.004-.791 1.486-.812 1.662-.757 2.264.044.488 1.816 1.587 2.798 2.197.229.142.415.257.526.332.098.066.266.168.48.298 1.076.654 3.29 2 3.29 2.81 0 .97-3.078 3.507-4.05 4.137-.97.63-3.612 2.008-4.737 2.218-1.124.21-2.88-1.061-3.904-2.966-.963-1.791-.207-3.559.406-4.99l.115-.27c.559-1.322-.233-2.118-.772-2.658a9.377 9.377 0 01-.175-.18l-5.39-5.717c-.158-.167-.316-.32-.468-.468-.728-.707-1.346-1.308-1.346-2.842 0-1.855 7.189-10.536 7.189-10.536s6.066 1.157 6.884 1.157c.653 0 1.913-.433 3.227-.885.333-.114.669-.23 1-.34 1.635-.545 2.726-.549 2.726-.549s1.09.004 2.726.549c.33.11.667.226 1 .34 1.313.452 2.574.885 3.226.885zm-1.041 30.706c1.282.66 2.192 1.128 2.536 1.343.445.278.174.803-.232 1.09-.405.285-5.853 4.499-6.381 4.965l-.215.191c-.509.459-1.159 1.044-1.62 1.044-.46 0-1.11-.586-1.62-1.044l-.213-.191c-.53-.466-5.977-4.68-6.382-4.966-.405-.286-.677-.81-.232-1.09.344-.214 1.255-.683 2.539-1.344l1.22-.629c1.92-.992 4.315-1.837 4.689-1.837.373 0 2.767.844 4.689 1.837.436.226.845.437 1.222.63z" fill="#fff"/><path fill-rule="evenodd" clip-rule="evenodd" d="M43.34 6.334L37.751 0H18.12l-5.589 6.334s-4.908-1.362-7.225.953c0 0 6.544-.59 8.793 3.064 0 0 6.066 1.157 6.884 1.157.818 0 2.59-.68 4.226-1.225 1.636-.545 2.727-.549 2.727-.549s1.09.004 2.726.549 3.408 1.225 4.226 1.225c.818 0 6.885-1.157 6.885-1.157 2.249-3.654 8.792-3.064 8.792-3.064-2.317-2.315-7.225-.953-7.225-.953z" fill="url(#paint1_linear)"/><defs><linearGradient id="paint0_linear" x1=".671" y1="64.319" x2="55.2" y2="64.319" gradientUnits="userSpaceOnUse"><stop stop-color="#F50"/><stop offset=".41" stop-color="#F50"/><stop offset=".582" stop-color="#FF2000"/><stop offset="1" stop-color="#FF2000"/></linearGradient><linearGradient id="paint1_linear" x1="6.278" y1="11.466" x2="50.565" y2="11.466" gradientUnits="userSpaceOnUse"><stop stop-color="#FF452A"/><stop offset="1" stop-color="#FF2000"/></linearGradient></defs></svg>

After

Width:  |  Height:  |  Size: 4.0 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.6 KiB

+4
View File
@@ -0,0 +1,4 @@
<svg xmlns="http://www.w3.org/2000/svg" width="48" height="48" viewBox="0 0 48 48">
<rect width="48" height="48" rx="8" fill="#1E40AF"/>
<text x="24" y="32" text-anchor="middle" font-family="system-ui,-apple-system,sans-serif" font-size="22" font-weight="700" fill="white">exa</text>
</svg>

After

Width:  |  Height:  |  Size: 295 B

+4
View File
@@ -0,0 +1,4 @@
<svg xmlns="http://www.w3.org/2000/svg" width="48" height="48" viewBox="0 0 48 48">
<rect width="48" height="48" rx="8" fill="#1E40AF"/>
<text x="24" y="32" text-anchor="middle" font-family="system-ui,-apple-system,sans-serif" font-size="22" font-weight="700" fill="white">exa</text>
</svg>

After

Width:  |  Height:  |  Size: 295 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.0 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 2.1 KiB

After

Width:  |  Height:  |  Size: 7.0 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.3 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.3 KiB

+63 -7
View File
@@ -14,6 +14,7 @@
*
* Fixes: https://github.com/diegosouzapw/OmniRoute/issues/129
* Fixes: https://github.com/diegosouzapw/OmniRoute/issues/321
* Fixes: https://github.com/diegosouzapw/OmniRoute/issues/426
*/
import { existsSync, copyFileSync, mkdirSync } from "node:fs";
@@ -80,8 +81,54 @@ if (existsSync(rootBinary)) {
}
}
// Strategy 1.5: Use node-pre-gyp to download the correct prebuilt binary
// This works on Windows without requiring node-gyp, Python, or MSVC.
// better-sqlite3 ships prebuilts for win32-x64, win32-arm64, darwin-x64/arm64.
console.log(" 📥 Attempting to download prebuilt binary via node-pre-gyp...");
try {
const { execSync } = await import("node:child_process");
// better-sqlite3 bundles @mapbox/node-pre-gyp — use it directly
const preGypBin = join(
ROOT,
"app",
"node_modules",
".bin",
process.platform === "win32" ? "node-pre-gyp.cmd" : "node-pre-gyp"
);
const preGypFallback = join(
ROOT,
"app",
"node_modules",
"@mapbox",
"node-pre-gyp",
"bin",
"node-pre-gyp"
);
const preGypCmd = existsSync(preGypBin) ? preGypBin : preGypFallback;
if (existsSync(preGypCmd)) {
execSync(`"${process.execPath}" "${preGypCmd}" install --fallback-to-build=false`, {
cwd: join(ROOT, "app", "node_modules", "better-sqlite3"),
stdio: "inherit",
timeout: 60_000,
});
mkdirSync(dirname(appBinary), { recursive: true });
try {
process.dlopen({ exports: {} }, appBinary);
console.log(" ✅ Prebuilt binary downloaded and loaded successfully!\n");
process.exit(0);
} catch (loadErr) {
console.warn(` ⚠️ Downloaded binary failed to load: ${loadErr.message}`);
}
} else {
console.warn(" ⚠️ node-pre-gyp not found, skipping prebuilt download.");
}
} catch (err) {
console.warn(` ⚠️ node-pre-gyp download failed: ${err.message.split("\n")[0]}`);
}
// Strategy 2: Fall back to npm rebuild (may work if build tools are available)
console.log(" ⚠️ Root binary not available or incompatible, attempting npm rebuild...");
console.log(" ⚠️ Attempting npm rebuild (requires build tools)...");
try {
const { execSync } = await import("node:child_process");
@@ -103,14 +150,23 @@ try {
}
}
// If nothing worked, warn but don't fail the install — let the package stay
// installed so users can fix manually or use the pre-flight check in the CLI
console.warn(" ⚠️ Could not fix better-sqlite3 native module automatically.");
// If nothing worked, warn but don't fail the install
console.warn("\n ⚠️ Could not fix better-sqlite3 native module automatically.");
console.warn(" The server may not start correctly.");
console.warn(" Try manually:");
console.warn(` cd ${join(ROOT, "app")} && npm rebuild better-sqlite3`);
if (process.platform === "darwin") {
console.warn(" Manual fix options:");
if (process.platform === "win32") {
console.warn(" Option A (easiest — no build tools needed):");
console.warn(` cd "${join(ROOT, "app", "node_modules", "better-sqlite3")}"`);
console.warn(" npx @mapbox/node-pre-gyp install --fallback-to-build=false");
console.warn(" Option B (requires Build Tools for Visual Studio):");
console.warn(` cd "${join(ROOT, "app")}" && npm rebuild better-sqlite3`);
console.warn(" Install from: https://visualstudio.microsoft.com/visual-cpp-build-tools/");
console.warn(" Also ensure Python is installed: https://python.org");
} else if (process.platform === "darwin") {
console.warn(` cd ${join(ROOT, "app")} && npm rebuild better-sqlite3`);
console.warn(" If build tools are missing: xcode-select --install");
} else {
console.warn(` cd ${join(ROOT, "app")} && npm rebuild better-sqlite3`);
}
console.warn("");
+13
View File
@@ -278,6 +278,19 @@ if (existsSync(swcHelpersSrc) && !existsSync(swcHelpersDst)) {
console.log(" ✅ @swc/helpers included in standalone build.");
}
// ── Step 10.6: Remove large binaries from standalone build ──
// These directories contain platform-native binaries (.node, .asar) that
// trigger Z_DATA_ERROR during npm pack. They are not needed in the npm package.
const binaryDirsToRemove = ["vscode-extension", "electron"];
for (const dir of binaryDirsToRemove) {
const targetDir = join(APP_DIR, dir);
if (existsSync(targetDir)) {
console.log(` 🧹 Removing app/${dir}/ (not needed in npm package)...`);
rmSync(targetDir, { recursive: true, force: true });
console.log(` ✅ app/${dir}/ removed.`);
}
}
// ── Done ───────────────────────────────────────────────────
const appPkg = join(APP_DIR, "package.json");
if (existsSync(appPkg)) {
@@ -1181,6 +1181,12 @@ function ComboFormModal({ isOpen, combo, onClose, onSave, activeProviders }) {
const [config, setConfig] = useState(combo?.config || {});
const [showStrategyNudge, setShowStrategyNudge] = useState(false);
const strategyChangeMountedRef = useRef(false);
// Agent features (#399 / #401 / #454)
const [agentSystemMessage, setAgentSystemMessage] = useState<string>(combo?.system_message || "");
const [agentToolFilter, setAgentToolFilter] = useState<string>(combo?.tool_filter_regex || "");
const [agentContextCache, setAgentContextCache] = useState<boolean>(
!!combo?.context_cache_protection
);
// DnD state
const hasPricingForModel = useCallback(
@@ -1532,6 +1538,14 @@ function ComboFormModal({ isOpen, combo, onClose, onSave, activeProviders }) {
saveData.config = configToSave;
}
// Agent features (#399 / #401 / #454)
if (agentSystemMessage.trim()) saveData.system_message = agentSystemMessage.trim();
else delete saveData.system_message;
if (agentToolFilter.trim()) saveData.tool_filter_regex = agentToolFilter.trim();
else delete saveData.tool_filter_regex;
if (agentContextCache) saveData.context_cache_protection = true;
else delete saveData.context_cache_protection;
await onSave(saveData);
setSaving(false);
};
@@ -2052,6 +2066,72 @@ function ComboFormModal({ isOpen, combo, onClose, onSave, activeProviders }) {
</div>
)}
{/* Agent Features (#399 / #401 / #454) */}
<div className="flex flex-col gap-2 p-3 bg-black/[0.02] dark:bg-white/[0.02] rounded-lg border border-black/5 dark:border-white/5">
<div className="flex items-center gap-1.5 mb-1">
<span className="material-symbols-outlined text-[14px] text-primary">smart_toy</span>
<p className="text-xs font-medium">Agent Features</p>
<span className="text-[10px] text-text-muted">
optional, for agent/tool workflows
</span>
</div>
{/* System Message Override */}
<div>
<label className="text-[11px] font-medium text-text-muted block mb-0.5">
System Message Override
</label>
<textarea
rows={2}
value={agentSystemMessage}
onChange={(e) => setAgentSystemMessage(e.target.value)}
placeholder="Override the system prompt for all requests routed through this combo…"
className="w-full text-xs py-1.5 px-2 rounded border border-black/10 dark:border-white/10 bg-transparent focus:border-primary focus:outline-none resize-none"
/>
<p className="text-[10px] text-text-muted mt-0.5">
Replaces any system message sent by the client. Leave empty to pass through client
system messages.
</p>
</div>
{/* Tool Filter Regex */}
<div>
<label className="text-[11px] font-medium text-text-muted block mb-0.5">
Tool Filter Regex
</label>
<input
type="text"
value={agentToolFilter}
onChange={(e) => setAgentToolFilter(e.target.value)}
placeholder="e.g. ^(bash|computer)$"
className="w-full text-xs py-1.5 px-2 rounded border border-black/10 dark:border-white/10 bg-transparent focus:border-primary focus:outline-none font-mono"
/>
<p className="text-[10px] text-text-muted mt-0.5">
Only tools whose name matches this regex are forwarded to the provider. Leave empty
to forward all tools.
</p>
</div>
{/* Context Cache Protection */}
<div className="flex items-center justify-between gap-2">
<div>
<label className="text-[11px] font-medium text-text-muted block">
Context Cache Protection
</label>
<p className="text-[10px] text-text-muted">
Pins the provider/model across turns to preserve cache sessions. Internal tags are
stripped before forwarding to the provider.
</p>
</div>
<input
type="checkbox"
checked={agentContextCache}
onChange={(e) => setAgentContextCache(e.target.checked)}
className="accent-primary shrink-0"
/>
</div>
</div>
{/* Actions */}
<div className="flex gap-2 pt-1">
<Button onClick={onClose} variant="ghost" fullWidth size="sm">
@@ -33,11 +33,29 @@ export default function APIPageClient({ machineId }) {
const [viewTab, setViewTab] = useState("api");
const [mcpStatus, setMcpStatus] = useState<any>(null);
const [a2aStatus, setA2aStatus] = useState<any>(null);
const [searchProviders, setSearchProviders] = useState<any[]>([]);
const { copied, copy } = useCopyToClipboard();
const fetchSearchProviders = async () => {
try {
const res = await fetch("/api/search/providers");
if (res.ok) {
const data = await res.json();
setSearchProviders(data.providers || []);
}
} catch {
// Search endpoint may not be available
}
};
useEffect(() => {
Promise.allSettled([loadCloudSettings(), fetchModels(), fetchProtocolStatus()]).finally(() => {
Promise.allSettled([
loadCloudSettings(),
fetchModels(),
fetchProtocolStatus(),
fetchSearchProviders(),
]).finally(() => {
setLoading(false);
});
}, []);
@@ -575,6 +593,47 @@ export default function APIPageClient({ machineId }) {
</div>
</div>
{/* Search & Discovery */}
{searchProviders.length > 0 && (
<div className="mb-6">
<div className="flex items-center gap-2 mb-3">
<span className="material-symbols-outlined text-sm text-cyan-400">
travel_explore
</span>
<h3 className="text-xs font-semibold text-text-muted uppercase tracking-wider">
{t("categorySearch") || "Search & Discovery"}
</h3>
<div className="flex-1 h-px bg-border/50" />
</div>
<div className="flex flex-col gap-3">
<EndpointSection
icon="search"
iconColor="text-cyan-500"
iconBg="bg-cyan-500/10"
title={t("webSearch") || "Web Search"}
path="/v1/search"
description={
t("webSearchDesc") ||
"Unified web search across multiple providers with automatic failover and caching"
}
models={searchProviders.map((p) => ({
id: p.id,
name: p.name,
owned_by: p.id,
type: "search",
}))}
expanded={expandedEndpoint === "search"}
onToggle={() =>
setExpandedEndpoint(expandedEndpoint === "search" ? null : "search")
}
copy={copy}
copied={copied}
baseUrl={currentEndpoint}
/>
</div>
</div>
)}
{/* Utility & Management */}
<div>
<div className="flex items-center gap-2 mb-3">
+119 -11
View File
@@ -1,27 +1,135 @@
"use client";
import { useState } from "react";
import { useState, useRef, useEffect } from "react";
import { RequestLoggerV2, ProxyLogger, SegmentedControl } from "@/shared/components";
import ConsoleLogViewer from "@/shared/components/ConsoleLogViewer";
import AuditLogTab from "./AuditLogTab";
import { useTranslations } from "next-intl";
const TIME_RANGES = [
{ label: "1h", hours: 1 },
{ label: "6h", hours: 6 },
{ label: "12h", hours: 12 },
{ label: "24h", hours: 24 },
];
const TAB_TO_LOG_TYPE: Record<string, string> = {
"request-logs": "request-logs",
"proxy-logs": "proxy-logs",
"audit-logs": "call-logs",
console: "call-logs",
};
export default function LogsPage() {
const [activeTab, setActiveTab] = useState("request-logs");
const [showExport, setShowExport] = useState(false);
const [exporting, setExporting] = useState(false);
const dropdownRef = useRef<HTMLDivElement>(null);
const t = useTranslations("logs");
useEffect(() => {
function handleClickOutside(e: MouseEvent) {
if (dropdownRef.current && !dropdownRef.current.contains(e.target as Node)) {
setShowExport(false);
}
}
document.addEventListener("mousedown", handleClickOutside);
return () => document.removeEventListener("mousedown", handleClickOutside);
}, []);
async function handleExport(hours: number) {
setExporting(true);
setShowExport(false);
try {
const logType = TAB_TO_LOG_TYPE[activeTab] || "call-logs";
const res = await fetch(`/api/logs/export?hours=${hours}&type=${logType}`);
if (!res.ok) throw new Error("Export failed");
const blob = await res.blob();
const url = URL.createObjectURL(blob);
const a = document.createElement("a");
a.href = url;
a.download = `omniroute-${logType}-${hours}h-${new Date().toISOString().slice(0, 10)}.json`;
document.body.appendChild(a);
a.click();
document.body.removeChild(a);
URL.revokeObjectURL(url);
} catch (err) {
console.error("Export failed:", err);
} finally {
setExporting(false);
}
}
return (
<div className="flex flex-col gap-6">
<SegmentedControl
options={[
{ value: "request-logs", label: t("requestLogs") },
{ value: "proxy-logs", label: t("proxyLogs") },
{ value: "audit-logs", label: t("auditLog") },
{ value: "console", label: t("console") },
]}
value={activeTab}
onChange={setActiveTab}
/>
<div className="flex items-center justify-between gap-4 flex-wrap">
<SegmentedControl
options={[
{ value: "request-logs", label: t("requestLogs") },
{ value: "proxy-logs", label: t("proxyLogs") },
{ value: "audit-logs", label: t("auditLog") },
{ value: "console", label: t("console") },
]}
value={activeTab}
onChange={setActiveTab}
/>
<div className="relative" ref={dropdownRef}>
<button
id="export-logs-btn"
onClick={() => setShowExport(!showExport)}
disabled={exporting}
className="flex items-center gap-2 px-4 py-2 text-sm font-medium rounded-lg
bg-[var(--card-bg,#1e1e2e)] border border-[var(--border,#333)]
text-[var(--text-secondary,#aaa)] hover:text-[var(--text-primary,#fff)]
hover:border-[var(--accent,#7c3aed)] transition-all duration-200
disabled:opacity-50 disabled:cursor-not-allowed"
>
<svg
width="16"
height="16"
viewBox="0 0 16 16"
fill="none"
stroke="currentColor"
strokeWidth="1.5"
>
<path
d="M8 2v8m0 0l-3-3m3 3l3-3M3 12h10"
strokeLinecap="round"
strokeLinejoin="round"
/>
</svg>
{exporting ? "Exporting..." : "Export"}
</button>
{showExport && (
<div
className="absolute right-0 top-full mt-1 z-50 min-w-[140px] rounded-lg
bg-[var(--card-bg,#1e1e2e)] border border-[var(--border,#333)]
shadow-xl overflow-hidden animate-in fade-in"
>
<div className="px-3 py-2 text-xs text-[var(--text-muted,#666)] border-b border-[var(--border,#333)] font-medium">
Time Range
</div>
{TIME_RANGES.map((range) => (
<button
key={range.hours}
id={`export-${range.hours}h-btn`}
onClick={() => handleExport(range.hours)}
className="w-full px-3 py-2 text-sm text-left hover:bg-[var(--hover-bg,#2a2a3e)]
text-[var(--text-secondary,#aaa)] hover:text-[var(--text-primary,#fff)]
transition-colors flex items-center justify-between"
>
<span>Last {range.label}</span>
<span className="text-xs text-[var(--text-muted,#666)]">
{range.hours === 24 ? "default" : ""}
</span>
</button>
))}
</div>
)}
</div>
</div>
{/* Content */}
{activeTab === "request-logs" && <RequestLoggerV2 />}
@@ -0,0 +1,406 @@
"use client";
import { useState, useEffect, useRef } from "react";
import dynamic from "next/dynamic";
import { useTranslations } from "next-intl";
import { Card, Button, Select, Badge } from "@/shared/components";
const Editor = dynamic(() => import("@monaco-editor/react"), { ssr: false });
interface SearchProvider {
id: string;
name: string;
status: "active" | "no_credentials";
cost_per_query: number;
}
interface SearchResult {
title: string;
url: string;
snippet: string;
score?: number;
date?: string;
}
interface SearchResponse {
id: string;
provider: string;
results: SearchResult[];
query: string;
answer: string | null;
cached: boolean;
usage: {
queries_used: number;
search_cost_usd: number;
};
metrics: {
response_time_ms: number;
upstream_latency_ms: number;
total_results_available: number | null;
};
}
function formatBytes(bytes: number): string {
if (bytes < 1024) return `${bytes} B`;
return `${(bytes / 1024).toFixed(1)} KB`;
}
export default function SearchPlayground() {
const t = useTranslations("search");
const [providers, setProviders] = useState<SearchProvider[]>([]);
const [selectedProvider, setSelectedProvider] = useState("");
const [requestBody, setRequestBody] = useState(
JSON.stringify(
{
query: "latest AI developments",
max_results: 5,
search_type: "web",
},
null,
2
)
);
const [response, setResponse] = useState<SearchResponse | null>(null);
const [rawResponse, setRawResponse] = useState("");
const [loading, setLoading] = useState(false);
const [error, setError] = useState("");
const [duration, setDuration] = useState(0);
const [statusCode, setStatusCode] = useState(0);
const [showJson, setShowJson] = useState(false);
const abortRef = useRef<AbortController | null>(null);
useEffect(() => {
fetch("/api/search/providers")
.then((res) => res.json())
.then((data) => {
const allProviders = data.providers || [];
setProviders(allProviders);
const firstActive = allProviders.find((p: SearchProvider) => p.status === "active");
if (firstActive) setSelectedProvider(firstActive.id);
})
.catch(() => {});
}, []);
const handleSend = async () => {
setLoading(true);
setError("");
setResponse(null);
setRawResponse("");
setStatusCode(0);
const controller = new AbortController();
abortRef.current = controller;
const timeout = setTimeout(() => controller.abort(), 15_000);
const start = Date.now();
try {
let body: any;
try {
body = JSON.parse(requestBody);
} catch {
setError("Invalid JSON in request body");
setLoading(false);
clearTimeout(timeout);
return;
}
if (selectedProvider) body.provider = selectedProvider;
const res = await fetch("/api/v1/search", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(body),
signal: controller.signal,
});
setDuration(Date.now() - start);
setStatusCode(res.status);
const data = await res.json();
setRawResponse(JSON.stringify(data, null, 2));
if (res.ok) {
setResponse(data);
} else {
setError(data.error?.message || data.error || `Error ${res.status}`);
}
} catch (err: any) {
setDuration(Date.now() - start);
if (err.name === "AbortError") {
setError("Request timed out (15s)");
} else {
setError(err.message || "Network error");
}
} finally {
setLoading(false);
clearTimeout(timeout);
}
};
const handleCancel = () => {
abortRef.current?.abort();
};
const getScoreColor = (score: number) => {
if (score >= 0.9) return "text-success";
if (score >= 0.7) return "text-warning";
return "text-error";
};
const getScoreBg = (score: number) => {
if (score >= 0.9) return "bg-green-500/10";
if (score >= 0.7) return "bg-yellow-500/10";
return "bg-red-500/10";
};
const noProviders = providers.filter((p) => p.status === "active").length === 0;
const editorTheme =
typeof document !== "undefined" && document.documentElement.classList.contains("dark")
? "vs-dark"
: "light";
return (
<div className="grid grid-cols-1 lg:grid-cols-2 gap-4">
{/* Request panel */}
<Card>
<div className="p-4 space-y-3">
<div className="flex items-center justify-between">
<div className="flex items-center gap-2">
<span className="material-symbols-outlined text-[18px] text-text-muted">upload</span>
<h3 className="text-sm font-semibold text-text-main">Request</h3>
<Badge variant="info" size="sm">
POST /v1/search
</Badge>
</div>
<div className="flex items-center gap-1">
<button
onClick={() => navigator.clipboard.writeText(requestBody)}
className="p-1.5 rounded hover:bg-black/5 dark:hover:bg-white/5 text-text-muted hover:text-text-main transition-colors"
title="Copy"
>
<span className="material-symbols-outlined text-[16px]">content_copy</span>
</button>
<button
onClick={() =>
setRequestBody(
JSON.stringify(
{
query: "latest AI developments",
max_results: 5,
search_type: "web",
},
null,
2
)
)
}
className="p-1.5 rounded hover:bg-black/5 dark:hover:bg-white/5 text-text-muted hover:text-text-main transition-colors"
title="Reset to default"
>
<span className="material-symbols-outlined text-[16px]">restart_alt</span>
</button>
</div>
</div>
<div className="border border-border rounded-lg overflow-hidden">
<Editor
height="400px"
defaultLanguage="json"
value={requestBody}
onChange={(value: string | undefined) => setRequestBody(value || "")}
theme={editorTheme}
options={{
minimap: { enabled: false },
fontSize: 12,
lineNumbers: "on",
scrollBeyondLastLine: false,
wordWrap: "on",
automaticLayout: true,
formatOnPaste: true,
}}
/>
</div>
<div className="flex items-center gap-3">
<div className="flex-1">
<Select
value={selectedProvider}
onChange={(e: any) => setSelectedProvider(e.target.value)}
options={providers.map((p) => ({
value: p.id,
label: `${p.name}${p.status === "no_credentials" ? " (no key)" : ""}`,
}))}
className="w-full"
/>
</div>
{loading ? (
<Button icon="stop" variant="secondary" onClick={handleCancel}>
Cancel
</Button>
) : (
<Button
icon="search"
onClick={handleSend}
disabled={noProviders || !requestBody.trim()}
>
{t("webSearch")}
</Button>
)}
</div>
{noProviders && <p className="text-xs text-text-muted">{t("noSearchProviders")}</p>}
</div>
</Card>
{/* Response panel */}
<Card>
<div className="p-4 space-y-3">
<div className="flex items-center justify-between">
<div className="flex items-center gap-2">
<span className="material-symbols-outlined text-[18px] text-text-muted">
download
</span>
<h3 className="text-sm font-semibold text-text-main">Response</h3>
{statusCode > 0 && (
<>
<Badge variant={statusCode < 400 ? "success" : "error"} size="sm">
{statusCode}
</Badge>
<span className="text-xs text-text-muted">{duration}ms</span>
</>
)}
{loading && (
<span className="material-symbols-outlined text-[14px] text-primary animate-spin">
progress_activity
</span>
)}
</div>
{response && (
<div className="flex gap-1">
<button
className={`text-xs px-3 py-1 rounded-md ${
!showJson
? "bg-primary/15 text-primary font-medium"
: "bg-black/5 dark:bg-white/5 text-text-muted"
}`}
onClick={() => setShowJson(false)}
>
{t("formatted")}
</button>
<button
className={`text-xs px-3 py-1 rounded-md ${
showJson
? "bg-primary/15 text-primary font-medium"
: "bg-black/5 dark:bg-white/5 text-text-muted"
}`}
onClick={() => setShowJson(true)}
>
{t("rawJson")}
</button>
</div>
)}
</div>
<div className="border border-border rounded-lg overflow-hidden min-h-[400px]">
{loading && (
<div className="flex items-center justify-center h-[400px]">
<span className="material-symbols-outlined text-[24px] text-primary animate-spin">
progress_activity
</span>
</div>
)}
{error && !loading && (
<div className="p-4">
<div className="text-error text-sm">{error}</div>
</div>
)}
{response && !showJson && !loading && (
<div className="p-4 space-y-3">
{/* Meta bar */}
<div className="flex justify-between items-center p-2 bg-bg-alt rounded-lg">
<div className="flex items-center gap-3 text-xs text-text-muted">
<span>
{response.results.length} {t("searchResults").toLowerCase()}
</span>
<span className="flex items-center gap-1">
<span className="w-1.5 h-1.5 rounded-full bg-primary" />
{response.provider}
</span>
<span>${response.usage?.search_cost_usd?.toFixed(4)}</span>
<span>{formatBytes(rawResponse.length)}</span>
</div>
<span
className={`text-xs flex items-center gap-1 ${
response.cached ? "text-success" : "text-warning"
}`}
>
<span
className={`w-1.5 h-1.5 rounded-full ${
response.cached ? "bg-success" : "bg-warning"
}`}
/>
{response.cached ? t("cacheHit") : t("cacheMiss")}
</span>
</div>
{/* Results */}
{response.results.map((r, i) => (
<div
key={i}
className="border-l-[3px] border-l-primary p-3 bg-surface rounded-r-lg border border-border"
>
<div className="flex justify-between items-start">
<span className="text-sm font-medium text-text-main">
{i + 1}. {r.title}
</span>
{r.score != null && (
<span
className={`text-[10px] px-2 py-0.5 rounded-md ml-2 whitespace-nowrap ${getScoreBg(r.score)} ${getScoreColor(r.score)}`}
>
{r.score.toFixed(2)}
</span>
)}
</div>
<a
href={r.url}
target="_blank"
rel="noopener noreferrer"
className="text-accent text-[11px] block mt-0.5"
>
{r.url}
</a>
<p className="text-xs text-text-muted mt-1 leading-relaxed">{r.snippet}</p>
</div>
))}
</div>
)}
{response && showJson && !loading && (
<Editor
height="400px"
defaultLanguage="json"
value={rawResponse}
theme={editorTheme}
options={{
readOnly: true,
minimap: { enabled: false },
fontSize: 12,
lineNumbers: "on",
scrollBeyondLastLine: false,
wordWrap: "on",
automaticLayout: true,
}}
/>
)}
{!loading && !error && !response && (
<div className="flex items-center justify-center h-[400px] text-text-muted text-sm">
{t("emptyState")}
</div>
)}
</div>
</div>
</Card>
</div>
);
}

Some files were not shown because too many files have changed in this diff Show More