Files
OmniRoute/open-sse/utils/usageTracking.ts
T
Diego Rodrigues de Sa e Souza 70a4d38d04
Build Electron Desktop App / Validate version (push) Failing after 34s
Build Electron Desktop App / Build Electron (macos-arm64) (push) Has been skipped
Build Electron Desktop App / Build Electron (linux) (push) Has been skipped
Build Electron Desktop App / Build Electron (macos-intel) (push) Has been skipped
Build Electron Desktop App / Build Electron (windows) (push) Has been skipped
Build Electron Desktop App / Create Release (push) Has been skipped
Build Electron Desktop App / Publish to npm (push) Has been skipped
Release v3.4.0 (Integration) (#861)
* test(settings): add unit tests for debugMode and hiddenSidebarItems

Tests cover:
- PATCH debugMode=true/false
- PATCH hiddenSidebarItems with array values
- Combined updates with both fields

* test(e2e): add Playwright tests for settings toggles

Tests cover:
- Debug mode toggle on/off
- Sidebar visibility toggle
- Settings persistence after page reload

* fix(tests): address code review issues

- Unit tests: fix async/await for getSettings, use direct db functions
- E2E tests: remove conditional logic, use Playwright auto-waiting assertions

* feat(logging): unify request log retention and artifacts

* docs: add dashboard settings toggles to CONTRIBUTING

Add section documenting:
- Debug Mode toggle (Settings → Advanced)
- Sidebar Visibility toggle (Settings → General)

* fix(cache): only inject prompt_cache_key for supported providers

Only inject prompt_cache_key for providers that support prompt caching
(Claude, Anthropic, ZAI, Qwen, DeepSeek). This fixes issue #848 where
NVIDIA API rejected the parameter.

* fix(model-sync): log only channel-level model changes

* feat(providers): add 4 free models to opencode-zen

* feat(providers): add explicit contextLength for opencode-zen free models

* feat(providers): add contextLength for all opencode-zen models

* feat: Improve the Chinese translation

* fix: preserve client cache_control for all Claude-protocol providers

Previously, the cache control preservation logic only recognized a
hardcoded list of providers (claude, anthropic, zai, qwen, deepseek).
This caused OmniRoute to inject its own cache_control markers for
Claude-protocol providers not in that list (bailian-coding-plan, glm,
minimax, minimax-cn, etc.), overwriting the client's cache markers.

The fix checks both:
1. Known caching providers list (existing behavior)
2. Whether targetFormat === 'claude' (all Claude-protocol providers)

This ensures all Claude-compatible providers properly preserve client
cache_control headers when appropriate (Claude Code client, deterministic
routing, etc.).

Also removes unused CacheStatsCard from settings/components (duplicate
of the one in cache/ page).

Fixes cache token calculation for GLM, Minimax, and other Claude-compatible providers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: pure passthrough for Claude→Claude when cache_control preserved

The Claude passthrough path round-trips through OpenAI format
(claude→openai→claude) for structural normalization. This strips
cache_control markers from every content block since OpenAI format
has no equivalent, causing ~42k cache creation tokens per request
with zero cache reads.

When preserveCacheControl is true (Claude Code client, "always"
setting, or deterministic combo), skip the round-trip entirely and
forward the body as-is. Claude Code sends well-formed Messages API
payloads — the normalization was only needed for non-Code clients.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: restore CacheStatsCard — was not a duplicate

The first commit incorrectly deleted CacheStatsCard from
settings/components/ as a "duplicate". It's the only copy — both
settings/page.tsx and cache/page.tsx import from this location.

Restored the i18n-ized version from main.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(429): parse long quota reset times from error body

- Parse XhYmZs format from antigravity error messages (e.g., 27h41m36s)
- Dynamic retry-after threshold (60s default) instead of hardcoded 10s
- Add parseRetryFromErrorText() in accountFallback.ts for body parsing
- Fix 403 'verify your account' to trigger permanent deactivation
- Add keyword matching for 'quota will reset', 'exhausted capacity'
- Add unit tests for retry parsing and keyword matching

Fixes #858 (Antigravity 429 handling)
Fixes #832 (Qwen quota 429 - same underlying bug)

* chore: bump version to v3.4.0-dev

* fix(migrations): rename 013 to 014 to avoid collision with v3.3.11

* chore(docs): update CHANGELOG for v3.4.0 integrations

* fix: Claude token refresh, Antigravity quota, and 429 rate-limit handling

- Fix Claude OAuth token refresh to use form-urlencoded format (standard OAuth2)
- Add anthropic-beta header required by Claude OAuth API
- Switch Antigravity quota to use retrieveUserQuota API (same as Gemini CLI)
- Parse quota reset time for all providers (not just Antigravity)
- Add quota reset keywords to error classifier
- Cap maximum retry time at 24 hours to prevent infinite wait

Closes #836, #857, #858, #832

* fix(dashboard): resolve /dashboard/limits hanging UI with 70+ accounts via chunk parallelization (#784)

---------

Co-authored-by: oyi77 <oyi77@users.noreply.github.com>
Co-authored-by: R.D. <rogerproself@gmail.com>
Co-authored-by: kang-heewon <heewon.dev@gmail.com>
Co-authored-by: gmw <rorschach1167@qq.com>
Co-authored-by: tombii <github@tombii.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: diegosouzapw <diegosouzapw@users.noreply.github.com>
2026-03-31 10:22:52 -03:00

458 lines
15 KiB
TypeScript

/**
* Token Usage Tracking - Extract, normalize, estimate and log token usage
*/
import { appendRequestLog } from "@/lib/usageDb";
import {
getLoggedInputTokens,
getLoggedOutputTokens,
getPromptCacheCreationTokens,
getPromptCacheReadTokens,
} from "@/lib/usage/tokenAccounting";
import { FORMATS } from "../translator/formats.ts";
// ANSI color codes
export const COLORS = {
reset: "\x1b[0m",
red: "\x1b[31m",
green: "\x1b[32m",
yellow: "\x1b[33m",
blue: "\x1b[34m",
cyan: "\x1b[36m",
};
/**
* Safety buffer added to reported token usage to prevent clients from hitting
* context window limits. 2000 tokens accounts for overhead from system prompts,
* tool definitions, and format translation that may not be reflected in raw usage.
*/
const BUFFER_TOKENS = 2000;
// Get HH:MM:SS timestamp
function getTimeString() {
return new Date().toLocaleTimeString("en-US", {
hour12: false,
hour: "2-digit",
minute: "2-digit",
second: "2-digit",
});
}
/**
* Add buffer tokens to usage to prevent context errors
* @param {object} usage - Usage object (supported format)
* @returns {object} Usage with buffer added
*/
export function addBufferToUsage(usage) {
if (!usage || typeof usage !== "object") return usage;
const result = { ...usage };
// Claude format
if (result.input_tokens !== undefined) {
result.input_tokens += BUFFER_TOKENS;
}
// OpenAI format
if (result.prompt_tokens !== undefined) {
result.prompt_tokens += BUFFER_TOKENS;
}
// Calculate or update total_tokens
if (result.total_tokens !== undefined) {
result.total_tokens += BUFFER_TOKENS;
} else if (result.prompt_tokens !== undefined && result.completion_tokens !== undefined) {
// Calculate total_tokens if not exists
result.total_tokens = result.prompt_tokens + result.completion_tokens;
}
return result;
}
export function filterUsageForFormat(usage, targetFormat) {
if (!usage || typeof usage !== "object") return usage;
// Cross-map between Claude-style and OpenAI-style field names before filtering.
// Some providers return input_tokens/output_tokens even when using OpenAI format.
const convertedUsage = { ...usage };
if (targetFormat === FORMATS.CLAUDE || targetFormat === FORMATS.OPENAI_RESPONSES) {
// OpenAI → Claude: prompt_tokens → input_tokens
if (convertedUsage.prompt_tokens !== undefined && convertedUsage.input_tokens === undefined) {
convertedUsage.input_tokens = convertedUsage.prompt_tokens;
}
if (
convertedUsage.completion_tokens !== undefined &&
convertedUsage.output_tokens === undefined
) {
convertedUsage.output_tokens = convertedUsage.completion_tokens;
}
} else {
// Claude → OpenAI: input_tokens → prompt_tokens
if (convertedUsage.input_tokens !== undefined && convertedUsage.prompt_tokens === undefined) {
convertedUsage.prompt_tokens = convertedUsage.input_tokens;
}
if (
convertedUsage.output_tokens !== undefined &&
convertedUsage.completion_tokens === undefined
) {
convertedUsage.completion_tokens = convertedUsage.output_tokens;
}
// Ensure total_tokens is set
if (
convertedUsage.total_tokens === undefined &&
convertedUsage.prompt_tokens !== undefined &&
convertedUsage.completion_tokens !== undefined
) {
convertedUsage.total_tokens = convertedUsage.prompt_tokens + convertedUsage.completion_tokens;
}
}
// Helper to pick only defined fields from usage
const pickFields = (fields) => {
const filtered = {};
for (const field of fields) {
if (convertedUsage[field] !== undefined) {
filtered[field] = convertedUsage[field];
}
}
return filtered;
};
// Define allowed fields for each format
const formatFields = {
[FORMATS.CLAUDE]: [
"input_tokens",
"output_tokens",
"cache_read_input_tokens",
"cache_creation_input_tokens",
"estimated",
],
[FORMATS.GEMINI]: [
"promptTokenCount",
"candidatesTokenCount",
"totalTokenCount",
"cachedContentTokenCount",
"thoughtsTokenCount",
"estimated",
],
[FORMATS.OPENAI_RESPONSES]: [
"input_tokens",
"output_tokens",
"input_tokens_details",
"output_tokens_details",
"estimated",
],
// OpenAI format (default for OPENAI, CODEX, KIRO, etc.)
default: [
"prompt_tokens",
"completion_tokens",
"total_tokens",
"cached_tokens",
"reasoning_tokens",
"prompt_tokens_details",
"completion_tokens_details",
"estimated",
],
};
// Get fields for target format
let fields = formatFields[targetFormat];
// Use same fields for similar formats
if (targetFormat === FORMATS.GEMINI_CLI || targetFormat === FORMATS.ANTIGRAVITY) {
fields = formatFields[FORMATS.GEMINI];
} else if (targetFormat === FORMATS.OPENAI_RESPONSE) {
fields = formatFields[FORMATS.OPENAI_RESPONSES];
} else if (!fields) {
fields = formatFields.default;
}
return pickFields(fields);
}
/**
* Normalize usage object - ensure all values are valid numbers
*/
export function normalizeUsage(usage) {
if (!usage || typeof usage !== "object" || Array.isArray(usage)) return null;
const normalized = {};
const assignNumber = (key, value) => {
if (value === undefined || value === null) return;
const numeric = Number(value);
if (Number.isFinite(numeric)) normalized[key] = numeric;
};
assignNumber("prompt_tokens", usage?.prompt_tokens);
assignNumber("completion_tokens", usage?.completion_tokens);
assignNumber("total_tokens", usage?.total_tokens);
assignNumber("cache_read_input_tokens", usage?.cache_read_input_tokens);
assignNumber("cache_creation_input_tokens", usage?.cache_creation_input_tokens);
assignNumber("cached_tokens", usage?.cached_tokens);
assignNumber("reasoning_tokens", usage?.reasoning_tokens);
if (Object.keys(normalized).length === 0) return null;
return normalized;
}
/**
* Check if usage has valid token data
* Valid = has at least one token field with value > 0
* Invalid = empty object {}, null, undefined, no token fields, or all zeros
*/
export function hasValidUsage(usage) {
if (!usage || typeof usage !== "object") return false;
// Check for known token fields with value > 0
const tokenFields = [
"prompt_tokens",
"completion_tokens",
"total_tokens", // OpenAI
"input_tokens",
"output_tokens", // Claude
"promptTokenCount",
"candidatesTokenCount", // Gemini
];
for (const field of tokenFields) {
if (typeof usage[field] === "number" && usage[field] > 0) {
return true;
}
}
return false;
}
/**
* Extract usage from supported formats (Claude, OpenAI, Gemini, Responses API)
*/
export function extractUsage(chunk) {
if (!chunk || typeof chunk !== "object") return null;
// Claude/Antigravity streaming: message_start event carries INPUT tokens
// FIX #74: This event was not handled — input_tokens were being dropped
// Structure: { type: "message_start", message: { usage: { input_tokens: N, output_tokens: 0 } } }
if (chunk.type === "message_start" && chunk.message?.usage) {
const u = chunk.message.usage;
const inputTokens = u.input_tokens || u.prompt_tokens || 0;
if (inputTokens > 0) {
return normalizeUsage({
prompt_tokens: inputTokens,
completion_tokens: u.output_tokens || u.completion_tokens || 0,
cache_read_input_tokens: u.cache_read_input_tokens,
cache_creation_input_tokens: u.cache_creation_input_tokens,
});
}
}
// Claude format (message_delta event) — carries OUTPUT tokens
if (chunk.type === "message_delta" && chunk.usage && typeof chunk.usage === "object") {
return normalizeUsage({
prompt_tokens: chunk.usage.input_tokens || 0,
completion_tokens: chunk.usage.output_tokens || 0,
cache_read_input_tokens: chunk.usage.cache_read_input_tokens,
cache_creation_input_tokens: chunk.usage.cache_creation_input_tokens,
});
}
// OpenAI Responses API format (response.completed or response.done)
if (
(chunk.type === "response.completed" || chunk.type === "response.done") &&
chunk.response?.usage &&
typeof chunk.response.usage === "object"
) {
const usage = chunk.response.usage;
return normalizeUsage({
prompt_tokens: usage.input_tokens || usage.prompt_tokens || 0,
completion_tokens: usage.output_tokens || usage.completion_tokens || 0,
cached_tokens: usage.input_tokens_details?.cached_tokens,
reasoning_tokens: usage.output_tokens_details?.reasoning_tokens,
});
}
// OpenAI format
if (
chunk.usage &&
typeof chunk.usage === "object" &&
(chunk.usage.prompt_tokens !== undefined || chunk.usage.input_tokens !== undefined)
) {
return normalizeUsage({
prompt_tokens: chunk.usage.prompt_tokens ?? chunk.usage.input_tokens ?? 0,
completion_tokens: chunk.usage.completion_tokens ?? chunk.usage.output_tokens ?? 0,
cached_tokens: chunk.usage.prompt_tokens_details?.cached_tokens,
reasoning_tokens: chunk.usage.completion_tokens_details?.reasoning_tokens,
});
}
// Gemini format (Antigravity)
if (chunk.usageMetadata && typeof chunk.usageMetadata === "object") {
return normalizeUsage({
prompt_tokens: chunk.usageMetadata?.promptTokenCount || 0,
completion_tokens: chunk.usageMetadata?.candidatesTokenCount || 0,
total_tokens: chunk.usageMetadata?.totalTokenCount,
cached_tokens: chunk.usageMetadata?.cachedContentTokenCount,
reasoning_tokens: chunk.usageMetadata?.thoughtsTokenCount,
});
}
return null;
}
// Heuristic token estimation constants
const CHARS_PER_TOKEN_SCHEMA = 6; // ~6 chars/token for JSON schemas (more verbose per token)
/**
* Improved token estimation heuristic (no dependency).
* Splits text on common token boundaries (whitespace, punctuation, camelCase)
* and applies a sub-word correction factor. Better accuracy for:
* - English text (~4 chars/token)
* - CJK text (~1 char/token for ideographs)
* - Code (~3.5 chars/token, more punctuation-heavy)
*
* @param {string} text - Text to estimate tokens for
* @returns {number} Estimated token count
*/
function estimateTokenCount(text) {
if (!text || typeof text !== "string") return 0;
// Count CJK ideographs separately — each is roughly 1 token
const cjkMatches = text.match(/[\u3000-\u9fff\uf900-\ufaff\u{20000}-\u{2fa1f}]/gu);
const cjkCount = cjkMatches ? cjkMatches.length : 0;
// Remove CJK chars for the remaining estimation
const nonCJK = text.replace(/[\u3000-\u9fff\uf900-\ufaff]/g, " ");
// Split on token boundaries: whitespace, punctuation, camelCase transitions
const tokens = nonCJK
.split(/(\s+|[^\w\s]|(?<=[a-z])(?=[A-Z]))/)
.filter((t) => t && t.trim().length > 0);
// Apply sub-word correction: BPE tokenizers often split long words
// into sub-word pieces, so raw token count underestimates slightly
const estimatedNonCJK = Math.ceil(tokens.length * 1.3);
return cjkCount + estimatedNonCJK;
}
/**
* Estimate input tokens from request body.
* Separates tool definitions (JSON schemas) from message content
* for more accurate estimation since JSON schemas are more verbose but
* compress into fewer tokens than plain text.
*/
export function estimateInputTokens(body) {
if (!body || typeof body !== "object") return 0;
try {
let toolTokens = 0;
let messageTokens = 0;
// Separate tool definitions from the rest of the body
if (body.tools && Array.isArray(body.tools)) {
const toolStr = JSON.stringify(body.tools);
toolTokens = Math.ceil(toolStr.length / CHARS_PER_TOKEN_SCHEMA);
// Estimate messages without tools
const { tools, ...bodyWithoutTools } = body;
messageTokens = estimateTokenCount(JSON.stringify(bodyWithoutTools));
} else {
messageTokens = estimateTokenCount(JSON.stringify(body));
}
return messageTokens + toolTokens;
} catch (err) {
// Fallback if stringify fails
return 0;
}
}
/**
* Estimate output tokens from content length.
* Uses improved heuristic when possible, falls back to length-based estimation.
*/
export function estimateOutputTokens(contentLength) {
if (!contentLength || contentLength <= 0) return 0;
// When we only have a character count, use 4 chars/token with sub-word correction
return Math.max(1, Math.ceil(contentLength / 3.5));
}
/**
* Format usage object based on target format
* @param {number} inputTokens - Input/prompt tokens
* @param {number} outputTokens - Output/completion tokens
* @param {string} targetFormat - Target format from FORMATS
*/
export function formatUsage(inputTokens, outputTokens, targetFormat) {
// Claude format uses input_tokens/output_tokens
if (targetFormat === FORMATS.CLAUDE) {
return addBufferToUsage({
input_tokens: inputTokens,
output_tokens: outputTokens,
estimated: true,
});
}
// Default: OpenAI format (works for openai, gemini, responses, etc.)
return addBufferToUsage({
prompt_tokens: inputTokens,
completion_tokens: outputTokens,
total_tokens: inputTokens + outputTokens,
estimated: true,
});
}
/**
* Estimate full usage when provider doesn't return it
* @param {object} body - Request body for input token estimation
* @param {number} contentLength - Content length for output token estimation
* @param {string} targetFormat - Target format from FORMATS constant
*/
export function estimateUsage(body, contentLength, targetFormat = FORMATS.OPENAI) {
return formatUsage(estimateInputTokens(body), estimateOutputTokens(contentLength), targetFormat);
}
/**
* Log usage with cache info (green color)
*/
export function logUsage(provider, usage, model = null, connectionId = null, apiKeyInfo = null) {
if (!usage || typeof usage !== "object") return;
const p = provider?.toUpperCase() || "UNKNOWN";
// Support both formats:
// - OpenAI: prompt_tokens, completion_tokens
// - Claude: input_tokens, output_tokens
const inTokens = getLoggedInputTokens(usage);
const outTokens = getLoggedOutputTokens(usage);
const accountPrefix = connectionId ? connectionId.slice(0, 8) + "..." : "unknown";
let msg = `[${getTimeString()}] 📊 ${COLORS.green}[USAGE] ${p} | in=${inTokens} | out=${outTokens} | account=${accountPrefix}${COLORS.reset}`;
// Add estimated flag if present
if (usage.estimated) {
msg += ` ${COLORS.yellow}(estimated)${COLORS.reset}`;
}
// Add cache info if present (unified from different formats)
const cacheRead = getPromptCacheReadTokens(usage);
if (cacheRead) msg += ` | cache_read=${cacheRead}`;
const cacheCreation = getPromptCacheCreationTokens(usage);
if (cacheCreation) msg += ` | cache_create=${cacheCreation}`;
const reasoning = usage.reasoning_tokens;
if (reasoning) msg += ` | reasoning=${reasoning}`;
console.log(msg);
// Streaming requests persist usage once in chatCore's completion callback.
// Keep this helper side-effect free apart from console visibility.
const tokens = {
input: inTokens,
output: outTokens,
cacheRead: cacheRead || 0,
cacheCreation: cacheCreation || 0,
reasoning: reasoning || 0,
};
appendRequestLog({ model, provider, connectionId, tokens, status: "200 OK" }).catch(() => {});
}