Files
OmniRoute/open-sse/utils/cacheControlPolicy.ts
T
Diego Rodrigues de Sa e Souza 70a4d38d04
Build Electron Desktop App / Validate version (push) Failing after 34s
Build Electron Desktop App / Build Electron (macos-arm64) (push) Has been skipped
Build Electron Desktop App / Build Electron (linux) (push) Has been skipped
Build Electron Desktop App / Build Electron (macos-intel) (push) Has been skipped
Build Electron Desktop App / Build Electron (windows) (push) Has been skipped
Build Electron Desktop App / Create Release (push) Has been skipped
Build Electron Desktop App / Publish to npm (push) Has been skipped
Release v3.4.0 (Integration) (#861)
* test(settings): add unit tests for debugMode and hiddenSidebarItems

Tests cover:
- PATCH debugMode=true/false
- PATCH hiddenSidebarItems with array values
- Combined updates with both fields

* test(e2e): add Playwright tests for settings toggles

Tests cover:
- Debug mode toggle on/off
- Sidebar visibility toggle
- Settings persistence after page reload

* fix(tests): address code review issues

- Unit tests: fix async/await for getSettings, use direct db functions
- E2E tests: remove conditional logic, use Playwright auto-waiting assertions

* feat(logging): unify request log retention and artifacts

* docs: add dashboard settings toggles to CONTRIBUTING

Add section documenting:
- Debug Mode toggle (Settings → Advanced)
- Sidebar Visibility toggle (Settings → General)

* fix(cache): only inject prompt_cache_key for supported providers

Only inject prompt_cache_key for providers that support prompt caching
(Claude, Anthropic, ZAI, Qwen, DeepSeek). This fixes issue #848 where
NVIDIA API rejected the parameter.

* fix(model-sync): log only channel-level model changes

* feat(providers): add 4 free models to opencode-zen

* feat(providers): add explicit contextLength for opencode-zen free models

* feat(providers): add contextLength for all opencode-zen models

* feat: Improve the Chinese translation

* fix: preserve client cache_control for all Claude-protocol providers

Previously, the cache control preservation logic only recognized a
hardcoded list of providers (claude, anthropic, zai, qwen, deepseek).
This caused OmniRoute to inject its own cache_control markers for
Claude-protocol providers not in that list (bailian-coding-plan, glm,
minimax, minimax-cn, etc.), overwriting the client's cache markers.

The fix checks both:
1. Known caching providers list (existing behavior)
2. Whether targetFormat === 'claude' (all Claude-protocol providers)

This ensures all Claude-compatible providers properly preserve client
cache_control headers when appropriate (Claude Code client, deterministic
routing, etc.).

Also removes unused CacheStatsCard from settings/components (duplicate
of the one in cache/ page).

Fixes cache token calculation for GLM, Minimax, and other Claude-compatible providers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: pure passthrough for Claude→Claude when cache_control preserved

The Claude passthrough path round-trips through OpenAI format
(claude→openai→claude) for structural normalization. This strips
cache_control markers from every content block since OpenAI format
has no equivalent, causing ~42k cache creation tokens per request
with zero cache reads.

When preserveCacheControl is true (Claude Code client, "always"
setting, or deterministic combo), skip the round-trip entirely and
forward the body as-is. Claude Code sends well-formed Messages API
payloads — the normalization was only needed for non-Code clients.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: restore CacheStatsCard — was not a duplicate

The first commit incorrectly deleted CacheStatsCard from
settings/components/ as a "duplicate". It's the only copy — both
settings/page.tsx and cache/page.tsx import from this location.

Restored the i18n-ized version from main.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(429): parse long quota reset times from error body

- Parse XhYmZs format from antigravity error messages (e.g., 27h41m36s)
- Dynamic retry-after threshold (60s default) instead of hardcoded 10s
- Add parseRetryFromErrorText() in accountFallback.ts for body parsing
- Fix 403 'verify your account' to trigger permanent deactivation
- Add keyword matching for 'quota will reset', 'exhausted capacity'
- Add unit tests for retry parsing and keyword matching

Fixes #858 (Antigravity 429 handling)
Fixes #832 (Qwen quota 429 - same underlying bug)

* chore: bump version to v3.4.0-dev

* fix(migrations): rename 013 to 014 to avoid collision with v3.3.11

* chore(docs): update CHANGELOG for v3.4.0 integrations

* fix: Claude token refresh, Antigravity quota, and 429 rate-limit handling

- Fix Claude OAuth token refresh to use form-urlencoded format (standard OAuth2)
- Add anthropic-beta header required by Claude OAuth API
- Switch Antigravity quota to use retrieveUserQuota API (same as Gemini CLI)
- Parse quota reset time for all providers (not just Antigravity)
- Add quota reset keywords to error classifier
- Cap maximum retry time at 24 hours to prevent infinite wait

Closes #836, #857, #858, #832

* fix(dashboard): resolve /dashboard/limits hanging UI with 70+ accounts via chunk parallelization (#784)

---------

Co-authored-by: oyi77 <oyi77@users.noreply.github.com>
Co-authored-by: R.D. <rogerproself@gmail.com>
Co-authored-by: kang-heewon <heewon.dev@gmail.com>
Co-authored-by: gmw <rorschach1167@qq.com>
Co-authored-by: tombii <github@tombii.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: diegosouzapw <diegosouzapw@users.noreply.github.com>
2026-03-31 10:22:52 -03:00

312 lines
8.2 KiB
TypeScript

/**
* Cache Control Policy
*
* Determines when to preserve client-side prompt caching headers (cache_control)
* vs. applying OmniRoute's own caching strategy.
*
* Client-side caching (e.g., Claude Code) should be preserved when:
* 1. Client is Claude Code or similar caching-aware client
* 2. Request will hit a deterministic target (single model or deterministic combo strategy)
* 3. Provider supports prompt caching (Anthropic, Alibaba Qwen, etc.)
*/
import type { RoutingStrategyValue } from "../../src/shared/constants/routingStrategies";
/**
* Cache control preservation modes
*/
export type CacheControlMode = "auto" | "always" | "never";
/**
* Cache control settings from the database
*/
export interface CacheControlSettings {
alwaysPreserveClientCache?: CacheControlMode;
}
/**
* Cache metrics for tracking effectiveness
*/
export interface CacheControlMetrics {
// Totals
totalRequests: number;
requestsWithCacheControl: number;
// Token counts
totalInputTokens: number;
totalCachedTokens: number;
totalCacheCreationTokens: number;
// Savings
tokensSaved: number;
estimatedCostSaved: number;
// Breakdowns
byProvider: Record<
string,
{
requests: number;
inputTokens: number;
cachedTokens: number;
cacheCreationTokens: number;
}
>;
byStrategy: Record<
string,
{
requests: number;
inputTokens: number;
cachedTokens: number;
cacheCreationTokens: number;
}
>;
lastUpdated: string;
}
/**
* Routing strategies that are deterministic (same request → same provider)
*/
const DETERMINISTIC_STRATEGIES: Set<RoutingStrategyValue> = new Set(["priority", "cost-optimized"]);
/**
* Providers that support prompt caching
*/
const CACHING_PROVIDERS = new Set(["claude", "anthropic", "zai", "qwen", "deepseek"]);
/**
* Detect if the client is Claude Code or another caching-aware client
*/
export function isClaudeCodeClient(userAgent: string | null | undefined): boolean {
if (!userAgent) return false;
const ua = userAgent.toLowerCase();
// Claude Code user agents
if (ua.includes("claude-code") || ua.includes("claude_code")) return true;
if (ua.includes("anthropic") && ua.includes("cli")) return true;
return false;
}
/**
* Check if a provider supports prompt caching
* Supports caching if:
* 1. Provider is in the known caching providers list, OR
* 2. Provider uses Claude protocol (detected via targetFormat)
*/
export function providerSupportsCaching(
provider: string | null | undefined,
targetFormat?: string | null
): boolean {
if (!provider) return false;
if (CACHING_PROVIDERS.has(provider.toLowerCase())) return true;
// All Claude-protocol providers support prompt caching
if (targetFormat === "claude") return true;
return false;
}
/**
* Check if a routing strategy is deterministic
*/
export function isDeterministicStrategy(
strategy: RoutingStrategyValue | null | undefined
): boolean {
if (!strategy) return false;
return DETERMINISTIC_STRATEGIES.has(strategy);
}
/**
* Determine if client-side cache_control headers should be preserved
*
* @param userAgent - User-Agent header from the request
* @param isCombo - Whether this is a combo model
* @param comboStrategy - The combo's routing strategy (if applicable)
* @param targetProvider - The target provider for the request
* @param settings - Cache control settings from database (optional)
* @returns true if cache_control should be preserved, false if OmniRoute should manage it
*/
export function shouldPreserveCacheControl({
userAgent,
isCombo,
comboStrategy,
targetProvider,
targetFormat,
settings,
}: {
userAgent: string | null | undefined;
isCombo: boolean;
comboStrategy?: RoutingStrategyValue | null;
targetProvider: string | null | undefined;
targetFormat?: string | null;
settings?: CacheControlSettings;
}): boolean {
// User override takes precedence
if (settings?.alwaysPreserveClientCache === "always") {
return true;
}
if (settings?.alwaysPreserveClientCache === "never") {
return false;
}
// Auto mode: use automatic detection (existing logic)
// Must be a caching-aware client
if (!isClaudeCodeClient(userAgent)) {
return false;
}
// Target provider must support caching
if (!providerSupportsCaching(targetProvider, targetFormat)) {
return false;
}
// Single model: always preserve (deterministic)
if (!isCombo) {
return true;
}
// Combo: only preserve if strategy is deterministic
return isDeterministicStrategy(comboStrategy);
}
/**
* Track cache control metrics for a request
*/
export function trackCacheMetrics({
preserved,
provider,
strategy,
metrics,
inputTokens,
cachedTokens,
cacheCreationTokens,
}: {
preserved: boolean;
provider: string;
strategy: string | null | undefined;
metrics: CacheControlMetrics;
inputTokens?: number;
cachedTokens?: number;
cacheCreationTokens?: number;
}): CacheControlMetrics {
const now = new Date().toISOString();
// Initialize metrics if empty
if (!metrics) {
metrics = {
totalRequests: 0,
requestsWithCacheControl: 0,
totalInputTokens: 0,
totalCachedTokens: 0,
totalCacheCreationTokens: 0,
tokensSaved: 0,
estimatedCostSaved: 0,
byProvider: {},
byStrategy: {},
lastUpdated: now,
};
}
// Increment total requests
metrics.totalRequests++;
// Track token counts
const input = inputTokens || 0;
const cached = cachedTokens || 0;
const creation = cacheCreationTokens || 0;
metrics.totalInputTokens += input;
metrics.totalCachedTokens += cached;
metrics.totalCacheCreationTokens += creation;
// Calculate tokens saved (cached tokens are reused, not charged)
if (cached > 0) {
metrics.tokensSaved += cached;
}
// Only track requests where cache_control was preserved
if (preserved) {
metrics.requestsWithCacheControl++;
// Initialize provider tracking
if (!metrics.byProvider[provider]) {
metrics.byProvider[provider] = {
requests: 0,
inputTokens: 0,
cachedTokens: 0,
cacheCreationTokens: 0,
};
}
metrics.byProvider[provider].requests++;
metrics.byProvider[provider].inputTokens += input;
metrics.byProvider[provider].cachedTokens += cached;
metrics.byProvider[provider].cacheCreationTokens += creation;
// Initialize strategy tracking
if (strategy && !metrics.byStrategy[strategy]) {
metrics.byStrategy[strategy] = {
requests: 0,
inputTokens: 0,
cachedTokens: 0,
cacheCreationTokens: 0,
};
}
if (strategy) {
metrics.byStrategy[strategy].requests++;
metrics.byStrategy[strategy].inputTokens += input;
metrics.byStrategy[strategy].cachedTokens += cached;
metrics.byStrategy[strategy].cacheCreationTokens += creation;
}
}
metrics.lastUpdated = now;
return metrics;
}
/**
* Record cache token usage and update metrics
*/
export function updateCacheTokenMetrics({
metrics,
provider,
strategy,
inputTokens,
cachedTokens,
cacheCreationTokens,
costSaved,
}: {
metrics: CacheControlMetrics;
provider: string;
strategy: string | null | undefined;
inputTokens: number;
cachedTokens: number;
cacheCreationTokens: number;
costSaved?: number;
}): CacheControlMetrics {
metrics.totalCachedTokens += cachedTokens;
metrics.totalCacheCreationTokens += cacheCreationTokens;
metrics.totalInputTokens += inputTokens;
// Cached tokens are reused (saved), creation tokens are new cache writes
metrics.tokensSaved += cachedTokens;
if (costSaved !== undefined) {
metrics.estimatedCostSaved += costSaved;
}
// Update provider tracking
if (metrics.byProvider[provider]) {
metrics.byProvider[provider].cachedTokens += cachedTokens;
metrics.byProvider[provider].cacheCreationTokens += cacheCreationTokens;
metrics.byProvider[provider].inputTokens += inputTokens;
}
// Update strategy tracking
if (strategy && metrics.byStrategy[strategy]) {
metrics.byStrategy[strategy].cachedTokens += cachedTokens;
metrics.byStrategy[strategy].cacheCreationTokens += cacheCreationTokens;
metrics.byStrategy[strategy].inputTokens += inputTokens;
}
metrics.lastUpdated = new Date().toISOString();
return metrics;
}