70a4d38d04
Build Electron Desktop App / Validate version (push) Failing after 34s
Build Electron Desktop App / Build Electron (macos-arm64) (push) Has been skipped
Build Electron Desktop App / Build Electron (linux) (push) Has been skipped
Build Electron Desktop App / Build Electron (macos-intel) (push) Has been skipped
Build Electron Desktop App / Build Electron (windows) (push) Has been skipped
Build Electron Desktop App / Create Release (push) Has been skipped
Build Electron Desktop App / Publish to npm (push) Has been skipped
* test(settings): add unit tests for debugMode and hiddenSidebarItems Tests cover: - PATCH debugMode=true/false - PATCH hiddenSidebarItems with array values - Combined updates with both fields * test(e2e): add Playwright tests for settings toggles Tests cover: - Debug mode toggle on/off - Sidebar visibility toggle - Settings persistence after page reload * fix(tests): address code review issues - Unit tests: fix async/await for getSettings, use direct db functions - E2E tests: remove conditional logic, use Playwright auto-waiting assertions * feat(logging): unify request log retention and artifacts * docs: add dashboard settings toggles to CONTRIBUTING Add section documenting: - Debug Mode toggle (Settings → Advanced) - Sidebar Visibility toggle (Settings → General) * fix(cache): only inject prompt_cache_key for supported providers Only inject prompt_cache_key for providers that support prompt caching (Claude, Anthropic, ZAI, Qwen, DeepSeek). This fixes issue #848 where NVIDIA API rejected the parameter. * fix(model-sync): log only channel-level model changes * feat(providers): add 4 free models to opencode-zen * feat(providers): add explicit contextLength for opencode-zen free models * feat(providers): add contextLength for all opencode-zen models * feat: Improve the Chinese translation * fix: preserve client cache_control for all Claude-protocol providers Previously, the cache control preservation logic only recognized a hardcoded list of providers (claude, anthropic, zai, qwen, deepseek). This caused OmniRoute to inject its own cache_control markers for Claude-protocol providers not in that list (bailian-coding-plan, glm, minimax, minimax-cn, etc.), overwriting the client's cache markers. The fix checks both: 1. Known caching providers list (existing behavior) 2. Whether targetFormat === 'claude' (all Claude-protocol providers) This ensures all Claude-compatible providers properly preserve client cache_control headers when appropriate (Claude Code client, deterministic routing, etc.). Also removes unused CacheStatsCard from settings/components (duplicate of the one in cache/ page). Fixes cache token calculation for GLM, Minimax, and other Claude-compatible providers. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: pure passthrough for Claude→Claude when cache_control preserved The Claude passthrough path round-trips through OpenAI format (claude→openai→claude) for structural normalization. This strips cache_control markers from every content block since OpenAI format has no equivalent, causing ~42k cache creation tokens per request with zero cache reads. When preserveCacheControl is true (Claude Code client, "always" setting, or deterministic combo), skip the round-trip entirely and forward the body as-is. Claude Code sends well-formed Messages API payloads — the normalization was only needed for non-Code clients. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: restore CacheStatsCard — was not a duplicate The first commit incorrectly deleted CacheStatsCard from settings/components/ as a "duplicate". It's the only copy — both settings/page.tsx and cache/page.tsx import from this location. Restored the i18n-ized version from main. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(429): parse long quota reset times from error body - Parse XhYmZs format from antigravity error messages (e.g., 27h41m36s) - Dynamic retry-after threshold (60s default) instead of hardcoded 10s - Add parseRetryFromErrorText() in accountFallback.ts for body parsing - Fix 403 'verify your account' to trigger permanent deactivation - Add keyword matching for 'quota will reset', 'exhausted capacity' - Add unit tests for retry parsing and keyword matching Fixes #858 (Antigravity 429 handling) Fixes #832 (Qwen quota 429 - same underlying bug) * chore: bump version to v3.4.0-dev * fix(migrations): rename 013 to 014 to avoid collision with v3.3.11 * chore(docs): update CHANGELOG for v3.4.0 integrations * fix: Claude token refresh, Antigravity quota, and 429 rate-limit handling - Fix Claude OAuth token refresh to use form-urlencoded format (standard OAuth2) - Add anthropic-beta header required by Claude OAuth API - Switch Antigravity quota to use retrieveUserQuota API (same as Gemini CLI) - Parse quota reset time for all providers (not just Antigravity) - Add quota reset keywords to error classifier - Cap maximum retry time at 24 hours to prevent infinite wait Closes #836, #857, #858, #832 * fix(dashboard): resolve /dashboard/limits hanging UI with 70+ accounts via chunk parallelization (#784) --------- Co-authored-by: oyi77 <oyi77@users.noreply.github.com> Co-authored-by: R.D. <rogerproself@gmail.com> Co-authored-by: kang-heewon <heewon.dev@gmail.com> Co-authored-by: gmw <rorschach1167@qq.com> Co-authored-by: tombii <github@tombii.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: diegosouzapw <diegosouzapw@users.noreply.github.com>
312 lines
8.2 KiB
TypeScript
312 lines
8.2 KiB
TypeScript
/**
|
|
* Cache Control Policy
|
|
*
|
|
* Determines when to preserve client-side prompt caching headers (cache_control)
|
|
* vs. applying OmniRoute's own caching strategy.
|
|
*
|
|
* Client-side caching (e.g., Claude Code) should be preserved when:
|
|
* 1. Client is Claude Code or similar caching-aware client
|
|
* 2. Request will hit a deterministic target (single model or deterministic combo strategy)
|
|
* 3. Provider supports prompt caching (Anthropic, Alibaba Qwen, etc.)
|
|
*/
|
|
|
|
import type { RoutingStrategyValue } from "../../src/shared/constants/routingStrategies";
|
|
|
|
/**
|
|
* Cache control preservation modes
|
|
*/
|
|
export type CacheControlMode = "auto" | "always" | "never";
|
|
|
|
/**
|
|
* Cache control settings from the database
|
|
*/
|
|
export interface CacheControlSettings {
|
|
alwaysPreserveClientCache?: CacheControlMode;
|
|
}
|
|
|
|
/**
|
|
* Cache metrics for tracking effectiveness
|
|
*/
|
|
export interface CacheControlMetrics {
|
|
// Totals
|
|
totalRequests: number;
|
|
requestsWithCacheControl: number;
|
|
|
|
// Token counts
|
|
totalInputTokens: number;
|
|
totalCachedTokens: number;
|
|
totalCacheCreationTokens: number;
|
|
|
|
// Savings
|
|
tokensSaved: number;
|
|
estimatedCostSaved: number;
|
|
|
|
// Breakdowns
|
|
byProvider: Record<
|
|
string,
|
|
{
|
|
requests: number;
|
|
inputTokens: number;
|
|
cachedTokens: number;
|
|
cacheCreationTokens: number;
|
|
}
|
|
>;
|
|
byStrategy: Record<
|
|
string,
|
|
{
|
|
requests: number;
|
|
inputTokens: number;
|
|
cachedTokens: number;
|
|
cacheCreationTokens: number;
|
|
}
|
|
>;
|
|
|
|
lastUpdated: string;
|
|
}
|
|
|
|
/**
|
|
* Routing strategies that are deterministic (same request → same provider)
|
|
*/
|
|
const DETERMINISTIC_STRATEGIES: Set<RoutingStrategyValue> = new Set(["priority", "cost-optimized"]);
|
|
|
|
/**
|
|
* Providers that support prompt caching
|
|
*/
|
|
const CACHING_PROVIDERS = new Set(["claude", "anthropic", "zai", "qwen", "deepseek"]);
|
|
|
|
/**
|
|
* Detect if the client is Claude Code or another caching-aware client
|
|
*/
|
|
export function isClaudeCodeClient(userAgent: string | null | undefined): boolean {
|
|
if (!userAgent) return false;
|
|
const ua = userAgent.toLowerCase();
|
|
|
|
// Claude Code user agents
|
|
if (ua.includes("claude-code") || ua.includes("claude_code")) return true;
|
|
if (ua.includes("anthropic") && ua.includes("cli")) return true;
|
|
|
|
return false;
|
|
}
|
|
|
|
/**
|
|
* Check if a provider supports prompt caching
|
|
* Supports caching if:
|
|
* 1. Provider is in the known caching providers list, OR
|
|
* 2. Provider uses Claude protocol (detected via targetFormat)
|
|
*/
|
|
export function providerSupportsCaching(
|
|
provider: string | null | undefined,
|
|
targetFormat?: string | null
|
|
): boolean {
|
|
if (!provider) return false;
|
|
if (CACHING_PROVIDERS.has(provider.toLowerCase())) return true;
|
|
// All Claude-protocol providers support prompt caching
|
|
if (targetFormat === "claude") return true;
|
|
return false;
|
|
}
|
|
|
|
/**
|
|
* Check if a routing strategy is deterministic
|
|
*/
|
|
export function isDeterministicStrategy(
|
|
strategy: RoutingStrategyValue | null | undefined
|
|
): boolean {
|
|
if (!strategy) return false;
|
|
return DETERMINISTIC_STRATEGIES.has(strategy);
|
|
}
|
|
|
|
/**
|
|
* Determine if client-side cache_control headers should be preserved
|
|
*
|
|
* @param userAgent - User-Agent header from the request
|
|
* @param isCombo - Whether this is a combo model
|
|
* @param comboStrategy - The combo's routing strategy (if applicable)
|
|
* @param targetProvider - The target provider for the request
|
|
* @param settings - Cache control settings from database (optional)
|
|
* @returns true if cache_control should be preserved, false if OmniRoute should manage it
|
|
*/
|
|
export function shouldPreserveCacheControl({
|
|
userAgent,
|
|
isCombo,
|
|
comboStrategy,
|
|
targetProvider,
|
|
targetFormat,
|
|
settings,
|
|
}: {
|
|
userAgent: string | null | undefined;
|
|
isCombo: boolean;
|
|
comboStrategy?: RoutingStrategyValue | null;
|
|
targetProvider: string | null | undefined;
|
|
targetFormat?: string | null;
|
|
settings?: CacheControlSettings;
|
|
}): boolean {
|
|
// User override takes precedence
|
|
if (settings?.alwaysPreserveClientCache === "always") {
|
|
return true;
|
|
}
|
|
if (settings?.alwaysPreserveClientCache === "never") {
|
|
return false;
|
|
}
|
|
|
|
// Auto mode: use automatic detection (existing logic)
|
|
// Must be a caching-aware client
|
|
if (!isClaudeCodeClient(userAgent)) {
|
|
return false;
|
|
}
|
|
|
|
// Target provider must support caching
|
|
if (!providerSupportsCaching(targetProvider, targetFormat)) {
|
|
return false;
|
|
}
|
|
|
|
// Single model: always preserve (deterministic)
|
|
if (!isCombo) {
|
|
return true;
|
|
}
|
|
|
|
// Combo: only preserve if strategy is deterministic
|
|
return isDeterministicStrategy(comboStrategy);
|
|
}
|
|
|
|
/**
|
|
* Track cache control metrics for a request
|
|
*/
|
|
export function trackCacheMetrics({
|
|
preserved,
|
|
provider,
|
|
strategy,
|
|
metrics,
|
|
inputTokens,
|
|
cachedTokens,
|
|
cacheCreationTokens,
|
|
}: {
|
|
preserved: boolean;
|
|
provider: string;
|
|
strategy: string | null | undefined;
|
|
metrics: CacheControlMetrics;
|
|
inputTokens?: number;
|
|
cachedTokens?: number;
|
|
cacheCreationTokens?: number;
|
|
}): CacheControlMetrics {
|
|
const now = new Date().toISOString();
|
|
|
|
// Initialize metrics if empty
|
|
if (!metrics) {
|
|
metrics = {
|
|
totalRequests: 0,
|
|
requestsWithCacheControl: 0,
|
|
totalInputTokens: 0,
|
|
totalCachedTokens: 0,
|
|
totalCacheCreationTokens: 0,
|
|
tokensSaved: 0,
|
|
estimatedCostSaved: 0,
|
|
byProvider: {},
|
|
byStrategy: {},
|
|
lastUpdated: now,
|
|
};
|
|
}
|
|
|
|
// Increment total requests
|
|
metrics.totalRequests++;
|
|
|
|
// Track token counts
|
|
const input = inputTokens || 0;
|
|
const cached = cachedTokens || 0;
|
|
const creation = cacheCreationTokens || 0;
|
|
|
|
metrics.totalInputTokens += input;
|
|
metrics.totalCachedTokens += cached;
|
|
metrics.totalCacheCreationTokens += creation;
|
|
|
|
// Calculate tokens saved (cached tokens are reused, not charged)
|
|
if (cached > 0) {
|
|
metrics.tokensSaved += cached;
|
|
}
|
|
|
|
// Only track requests where cache_control was preserved
|
|
if (preserved) {
|
|
metrics.requestsWithCacheControl++;
|
|
|
|
// Initialize provider tracking
|
|
if (!metrics.byProvider[provider]) {
|
|
metrics.byProvider[provider] = {
|
|
requests: 0,
|
|
inputTokens: 0,
|
|
cachedTokens: 0,
|
|
cacheCreationTokens: 0,
|
|
};
|
|
}
|
|
metrics.byProvider[provider].requests++;
|
|
metrics.byProvider[provider].inputTokens += input;
|
|
metrics.byProvider[provider].cachedTokens += cached;
|
|
metrics.byProvider[provider].cacheCreationTokens += creation;
|
|
|
|
// Initialize strategy tracking
|
|
if (strategy && !metrics.byStrategy[strategy]) {
|
|
metrics.byStrategy[strategy] = {
|
|
requests: 0,
|
|
inputTokens: 0,
|
|
cachedTokens: 0,
|
|
cacheCreationTokens: 0,
|
|
};
|
|
}
|
|
if (strategy) {
|
|
metrics.byStrategy[strategy].requests++;
|
|
metrics.byStrategy[strategy].inputTokens += input;
|
|
metrics.byStrategy[strategy].cachedTokens += cached;
|
|
metrics.byStrategy[strategy].cacheCreationTokens += creation;
|
|
}
|
|
}
|
|
|
|
metrics.lastUpdated = now;
|
|
return metrics;
|
|
}
|
|
|
|
/**
|
|
* Record cache token usage and update metrics
|
|
*/
|
|
export function updateCacheTokenMetrics({
|
|
metrics,
|
|
provider,
|
|
strategy,
|
|
inputTokens,
|
|
cachedTokens,
|
|
cacheCreationTokens,
|
|
costSaved,
|
|
}: {
|
|
metrics: CacheControlMetrics;
|
|
provider: string;
|
|
strategy: string | null | undefined;
|
|
inputTokens: number;
|
|
cachedTokens: number;
|
|
cacheCreationTokens: number;
|
|
costSaved?: number;
|
|
}): CacheControlMetrics {
|
|
metrics.totalCachedTokens += cachedTokens;
|
|
metrics.totalCacheCreationTokens += cacheCreationTokens;
|
|
metrics.totalInputTokens += inputTokens;
|
|
|
|
// Cached tokens are reused (saved), creation tokens are new cache writes
|
|
metrics.tokensSaved += cachedTokens;
|
|
if (costSaved !== undefined) {
|
|
metrics.estimatedCostSaved += costSaved;
|
|
}
|
|
|
|
// Update provider tracking
|
|
if (metrics.byProvider[provider]) {
|
|
metrics.byProvider[provider].cachedTokens += cachedTokens;
|
|
metrics.byProvider[provider].cacheCreationTokens += cacheCreationTokens;
|
|
metrics.byProvider[provider].inputTokens += inputTokens;
|
|
}
|
|
|
|
// Update strategy tracking
|
|
if (strategy && metrics.byStrategy[strategy]) {
|
|
metrics.byStrategy[strategy].cachedTokens += cachedTokens;
|
|
metrics.byStrategy[strategy].cacheCreationTokens += cacheCreationTokens;
|
|
metrics.byStrategy[strategy].inputTokens += inputTokens;
|
|
}
|
|
|
|
metrics.lastUpdated = new Date().toISOString();
|
|
return metrics;
|
|
}
|