2026-01-12 01:16:46 +00:00
---
2026-01-13 17:16:02 +05:30
summary: "Voice Call plugin: outbound + inbound calls via Twilio/Telnyx/Plivo (plugin install + config + CLI)"
2026-01-12 01:16:46 +00:00
read_when:
2026-01-30 03:15:10 +01:00
- You want to place an outbound voice call from OpenClaw
2026-01-12 01:16:46 +00:00
- You are configuring or developing the voice-call plugin
2026-01-31 16:04:03 -05:00
title: "Voice Call Plugin"
2026-01-12 01:16:46 +00:00
---
# Voice Call (plugin)
2026-01-30 03:15:10 +01:00
Voice calls for OpenClaw via a plugin. Supports outbound notifications and
2026-01-13 17:16:02 +05:30
multi-turn conversations with inbound policies.
2026-01-12 01:27:05 +00:00
2026-01-12 01:16:46 +00:00
Current providers:
2026-01-31 21:13:13 +09:00
2026-01-12 21:40:22 +00:00
- `twilio` (Programmable Voice + Media Streams)
- `telnyx` (Call Control v2)
2026-01-13 17:16:02 +05:30
- `plivo` (Voice API + XML transfer + GetInput speech)
2026-01-12 21:40:22 +00:00
- `mock` (dev/no network)
2026-01-12 01:16:46 +00:00
2026-01-12 01:27:05 +00:00
Quick mental model:
2026-01-31 21:13:13 +09:00
2026-01-13 17:16:02 +05:30
- Install plugin
- Restart Gateway
- Configure under `plugins.entries.voice-call.config`
2026-01-30 03:15:10 +01:00
- Use `openclaw voicecall ...` or the `voice_call` tool
2026-01-12 01:16:46 +00:00
2026-01-12 02:11:33 +00:00
## Where it runs (local vs remote)
2026-01-13 17:16:02 +05:30
The Voice Call plugin runs **inside the Gateway process ** .
2026-01-12 02:11:33 +00:00
2026-01-13 17:16:02 +05:30
If you use a remote Gateway, install/configure the plugin on the **machine running the Gateway ** , then restart the Gateway to load it.
2026-01-12 02:11:33 +00:00
2026-01-12 01:16:46 +00:00
## Install
### Option A: install from npm (recommended)
``` bash
2026-01-30 03:15:10 +01:00
openclaw plugins install @openclaw/voice-call
2026-01-12 01:16:46 +00:00
```
Restart the Gateway afterwards.
2026-01-12 01:27:05 +00:00
### Option B: install from a local folder (dev, no copying)
``` bash
2026-03-29 09:09:56 +01:00
PLUGIN_SRC = ./path/to/local/voice-call-plugin
openclaw plugins install " $PLUGIN_SRC "
cd " $PLUGIN_SRC " && pnpm install
2026-01-12 01:27:05 +00:00
```
Restart the Gateway afterwards.
2026-01-13 17:16:02 +05:30
## Config
2026-01-12 01:16:46 +00:00
2026-01-13 17:16:02 +05:30
Set config under `plugins.entries.voice-call.config` :
2026-01-12 01:16:46 +00:00
``` json5
{
plugins: {
entries: {
"voice-call": {
enabled: true,
config: {
2026-01-13 17:16:02 +05:30
provider: "twilio", // or "telnyx" | "plivo" | "mock"
2026-01-12 21:40:22 +00:00
fromNumber: "+15550001234",
toNumber: "+15550005678",
2026-01-13 17:16:02 +05:30
2026-01-12 01:16:46 +00:00
twilio: {
accountSid: "ACxxxxxxxx",
2026-01-31 21:13:13 +09:00
authToken: "...",
2026-01-12 21:40:22 +00:00
},
2026-01-13 17:16:02 +05:30
2026-02-14 19:00:14 +01:00
telnyx: {
apiKey: "...",
connectionId: "...",
// Telnyx webhook public key from the Telnyx Mission Control Portal
// (Base64 string; can also be set via TELNYX_PUBLIC_KEY).
publicKey: "...",
},
2026-01-13 17:16:02 +05:30
plivo: {
authId: "MAxxxxxxxxxxxxxxxxxxxx",
2026-01-31 21:13:13 +09:00
authToken: "...",
2026-01-13 17:16:02 +05:30
},
// Webhook server
serve: {
port: 3334,
2026-01-31 21:13:13 +09:00
path: "/voice/webhook",
2026-01-13 17:16:02 +05:30
},
2026-02-03 23:46:54 -08:00
// Webhook security (recommended for tunnels/proxies)
webhookSecurity: {
allowedHosts: ["voice.example.com"],
trustedProxyIPs: ["100.64.0.1"],
},
2026-01-13 17:16:02 +05:30
// Public exposure (pick one)
// publicUrl: "https://example.ngrok.app/voice/webhook",
// tunnel: { provider: "ngrok" },
// tailscale: { mode: "funnel", path: "/voice/webhook" }
outbound: {
2026-01-31 21:13:13 +09:00
defaultMode: "notify", // notify | conversation
2026-01-13 17:16:02 +05:30
},
streaming: {
enabled: true,
2026-04-04 20:23:28 +01:00
provider: "openai", // optional; first registered realtime transcription provider when unset
2026-01-31 21:13:13 +09:00
streamPath: "/voice/stream",
2026-04-04 20:23:28 +01:00
providers: {
openai: {
apiKey: "sk-...", // optional if OPENAI_API_KEY is set
model: "gpt-4o-transcribe",
silenceDurationMs: 800,
vadThreshold: 0.5,
},
},
2026-02-22 23:25:11 +01:00
preStartTimeoutMs: 5000,
maxPendingConnections: 32,
maxPendingConnectionsPerIp: 4,
maxConnections: 128,
2026-01-31 21:13:13 +09:00
},
},
},
},
},
2026-01-12 01:16:46 +00:00
}
```
2026-01-12 21:40:22 +00:00
Notes:
2026-01-31 21:13:13 +09:00
2026-01-13 17:16:02 +05:30
- Twilio/Telnyx require a **publicly reachable ** webhook URL.
- Plivo requires a **publicly reachable ** webhook URL.
2026-01-12 21:40:22 +00:00
- `mock` is a local dev provider (no network calls).
2026-04-04 14:06:12 +09:00
- If older configs still use `provider: "log"` , `twilio.from` , or legacy `streaming.*` OpenAI keys, run `openclaw doctor --fix` to rewrite them.
2026-02-14 18:13:44 +01:00
- Telnyx requires `telnyx.publicKey` (or `TELNYX_PUBLIC_KEY` ) unless `skipSignatureVerification` is true.
2026-01-12 21:40:22 +00:00
- `skipSignatureVerification` is for local testing only.
2026-01-26 16:18:29 +00:00
- If you use ngrok free tier, set `publicUrl` to the exact ngrok URL; signature verification is always enforced.
2026-01-26 22:26:22 +00:00
- `tunnel.allowNgrokFreeTierLoopbackBypass: true` allows Twilio webhooks with invalid signatures **only ** when `tunnel.provider="ngrok"` and `serve.bind` is loopback (ngrok local agent). Use for local dev only.
2026-01-26 16:18:29 +00:00
- Ngrok free tier URLs can change or add interstitial behavior; if `publicUrl` drifts, Twilio signatures will fail. For production, prefer a stable domain or Tailscale funnel.
2026-02-22 23:25:11 +01:00
- Streaming security defaults:
- `streaming.preStartTimeoutMs` closes sockets that never send a valid `start` frame.
2026-04-04 14:06:12 +09:00
- `streaming.maxPendingConnections` caps total unauthenticated pre-start sockets.
- `streaming.maxPendingConnectionsPerIp` caps unauthenticated pre-start sockets per source IP.
- `streaming.maxConnections` caps total open media stream sockets (pending + active).
- Runtime fallback still accepts those old voice-call keys for now, but the rewrite path is `openclaw doctor --fix` and the compat shim is temporary.
2026-01-12 21:40:22 +00:00
2026-04-04 20:23:28 +01:00
## Streaming transcription
`streaming` selects a realtime transcription provider for live call audio.
Current runtime behavior:
- `streaming.provider` is optional. If unset, Voice Call uses the first
registered realtime transcription provider.
- Today the bundled provider is OpenAI, registered by the bundled `openai`
plugin.
- Provider-owned raw config lives under `streaming.providers.<providerId>` .
- If `streaming.provider` points at an unregistered provider, or no realtime
transcription provider is registered at all, Voice Call logs a warning and
skips media streaming instead of failing the whole plugin.
OpenAI streaming transcription defaults:
- API key: `streaming.providers.openai.apiKey` or `OPENAI_API_KEY`
- model: `gpt-4o-transcribe`
- `silenceDurationMs` : `800`
- `vadThreshold` : `0.5`
Example:
``` json5
{
plugins: {
entries: {
"voice-call": {
config: {
streaming: {
enabled: true,
provider: "openai",
streamPath: "/voice/stream",
providers: {
openai: {
apiKey: "sk-...", // optional if OPENAI_API_KEY is set
model: "gpt-4o-transcribe",
silenceDurationMs: 800,
vadThreshold: 0.5,
},
},
},
},
},
},
},
}
```
Legacy keys are still auto-migrated by `openclaw doctor --fix` :
- `streaming.sttProvider` → `streaming.provider`
- `streaming.openaiApiKey` → `streaming.providers.openai.apiKey`
- `streaming.sttModel` → `streaming.providers.openai.model`
- `streaming.silenceDurationMs` → `streaming.providers.openai.silenceDurationMs`
- `streaming.vadThreshold` → `streaming.providers.openai.vadThreshold`
2026-02-16 22:16:31 -05:00
## Stale call reaper
Use `staleCallReaperSeconds` to end calls that never receive a terminal webhook
(for example, notify-mode calls that never complete). The default is `0`
(disabled).
Recommended ranges:
- **Production:** `120` – `300` seconds for notify-style flows.
- Keep this value **higher than `maxDurationSeconds` ** so normal calls can
finish. A good starting point is `maxDurationSeconds + 30– 60` seconds.
Example:
``` json5
{
plugins: {
entries: {
"voice-call": {
config: {
maxDurationSeconds: 300,
staleCallReaperSeconds: 360,
},
},
},
},
}
```
2026-02-03 23:46:54 -08:00
## Webhook Security
When a proxy or tunnel sits in front of the Gateway, the plugin reconstructs the
public URL for signature verification. These options control which forwarded
headers are trusted.
`webhookSecurity.allowedHosts` allowlists hosts from forwarding headers.
`webhookSecurity.trustForwardingHeaders` trusts forwarded headers without an allowlist.
`webhookSecurity.trustedProxyIPs` only trusts forwarded headers when the request
remote IP matches the list.
2026-02-24 02:37:04 +00:00
Webhook replay protection is enabled for Twilio and Plivo. Replayed valid webhook
requests are acknowledged but skipped for side effects.
Twilio conversation turns include a per-turn token in `<Gather>` callbacks, so
stale/replayed speech callbacks cannot satisfy a newer pending transcript turn.
2026-03-22 23:32:30 -07:00
Unauthenticated webhook requests are rejected before body reads when the
provider's required signature headers are missing.
The voice-call webhook uses the shared pre-auth body profile (64 KB / 5 seconds)
plus a per-IP in-flight cap before signature verification.
2026-02-03 23:46:54 -08:00
Example with a stable public host:
``` json5
{
plugins: {
entries: {
"voice-call": {
config: {
publicUrl: "https://voice.example.com/voice/webhook",
webhookSecurity: {
allowedHosts: ["voice.example.com"],
},
},
},
},
},
}
```
2026-01-25 09:29:50 +00:00
## TTS for calls
2026-03-16 18:50:03 -07:00
Voice Call uses the core `messages.tts` configuration for
2026-01-25 09:29:50 +00:00
streaming speech on calls. You can override it under the plugin config with the
**same shape ** — it deep‑ merges with `messages.tts` .
``` json5
{
tts: {
provider: "elevenlabs",
2026-03-30 22:05:03 -05:00
providers: {
elevenlabs: {
voiceId: "pMsXgVXv3BLzUgSXRplE",
modelId: "eleven_multilingual_v2",
},
2026-01-31 21:13:13 +09:00
},
},
2026-01-25 09:29:50 +00:00
}
```
Notes:
2026-01-31 21:13:13 +09:00
2026-03-30 22:05:03 -05:00
- Legacy `tts.<provider>` keys inside plugin config (`openai` , `elevenlabs` , `microsoft` , `edge` ) are auto-migrated to `tts.providers.<provider>` on load. Prefer the `providers` shape in committed config.
2026-03-16 18:50:03 -07:00
- **Microsoft speech is ignored for voice calls** (telephony audio needs PCM; the current Microsoft transport does not expose telephony PCM output).
2026-01-25 09:29:50 +00:00
- Core TTS is used when Twilio media streaming is enabled; otherwise calls fall back to provider native voices.
2026-03-21 04:15:16 -05:00
- If a Twilio media stream is already active, Voice Call does not fall back to TwiML `<Say>` . If telephony TTS is unavailable in that state, the playback request fails instead of mixing two playback paths.
2026-03-30 22:05:03 -05:00
- When telephony TTS falls back to a secondary provider, Voice Call logs a warning with the provider chain (`from` , `to` , `attempts` ) for debugging.
2026-01-25 09:29:50 +00:00
### More examples
Use core TTS only (no override):
``` json5
{
messages: {
tts: {
provider: "openai",
2026-03-30 22:05:03 -05:00
providers: {
openai: { voice: "alloy" },
},
2026-01-31 21:13:13 +09:00
},
},
2026-01-25 09:29:50 +00:00
}
```
Override to ElevenLabs just for calls (keep core default elsewhere):
``` json5
{
plugins: {
entries: {
"voice-call": {
config: {
tts: {
provider: "elevenlabs",
2026-03-30 22:05:03 -05:00
providers: {
elevenlabs: {
apiKey: "elevenlabs_key",
voiceId: "pMsXgVXv3BLzUgSXRplE",
modelId: "eleven_multilingual_v2",
},
2026-01-31 21:13:13 +09:00
},
},
},
},
},
},
2026-01-25 09:29:50 +00:00
}
```
Override only the OpenAI model for calls (deep‑ merge example):
``` json5
{
plugins: {
entries: {
"voice-call": {
config: {
tts: {
2026-03-30 22:05:03 -05:00
providers: {
openai: {
model: "gpt-4o-mini-tts",
voice: "marin",
},
2026-01-31 21:13:13 +09:00
},
},
},
},
},
},
2026-01-25 09:29:50 +00:00
}
```
2026-01-13 17:16:02 +05:30
## Inbound calls
2026-01-12 21:40:22 +00:00
2026-01-13 17:16:02 +05:30
Inbound policy defaults to `disabled` . To enable inbound calls, set:
2026-01-12 01:16:46 +00:00
``` json5
2026-01-12 21:40:22 +00:00
{
2026-01-13 17:16:02 +05:30
inboundPolicy: "allowlist",
allowFrom: ["+15550001234"],
2026-01-31 21:13:13 +09:00
inboundGreeting: "Hello! How can I help?",
2026-01-12 21:40:22 +00:00
}
2026-01-12 01:16:46 +00:00
```
2026-03-13 20:11:42 +00:00
`inboundPolicy: "allowlist"` is a low-assurance caller-ID screen. The plugin
normalizes the provider-supplied `From` value and compares it to `allowFrom` .
Webhook verification authenticates provider delivery and payload integrity, but
it does not prove PSTN/VoIP caller-number ownership. Treat `allowFrom` as
caller-ID filtering, not strong caller identity.
2026-01-13 17:16:02 +05:30
Auto-responses use the agent system. Tune with:
2026-01-31 21:13:13 +09:00
2026-01-13 17:16:02 +05:30
- `responseModel`
- `responseSystemPrompt`
- `responseTimeoutMs`
2026-01-12 01:16:46 +00:00
2026-03-21 04:15:16 -05:00
### Spoken output contract
For auto-responses, Voice Call appends a strict spoken-output contract to the system prompt:
- `{"spoken":"..."}`
Voice Call then extracts speech text defensively:
- Ignores payloads marked as reasoning/error content.
- Parses direct JSON, fenced JSON, or inline `"spoken"` keys.
- Falls back to plain text and removes likely planning/meta lead-in paragraphs.
This keeps spoken playback focused on caller-facing text and avoids leaking planning text into audio.
### Conversation startup behavior
For outbound `conversation` calls, first-message handling is tied to live playback state:
- Barge-in queue clear and auto-response are suppressed only while the initial greeting is actively speaking.
- If initial playback fails, the call returns to `listening` and the initial message remains queued for retry.
- Initial playback for Twilio streaming starts on stream connect without extra delay.
### Twilio stream disconnect grace
When a Twilio media stream disconnects, Voice Call waits `2000ms` before auto-ending the call:
- If the stream reconnects during that window, auto-end is canceled.
- If no stream is re-registered after the grace period, the call is ended to prevent stuck active calls.
2026-01-12 01:16:46 +00:00
## CLI
``` bash
2026-01-30 03:15:10 +01:00
openclaw voicecall call --to "+15555550123" --message "Hello from OpenClaw"
2026-03-18 12:26:18 -07:00
openclaw voicecall start --to "+15555550123" # alias for call
2026-01-30 03:15:10 +01:00
openclaw voicecall continue --call-id <id> --message "Any questions?"
openclaw voicecall speak --call-id <id> --message "One moment"
openclaw voicecall end --call-id <id>
openclaw voicecall status --call-id <id>
openclaw voicecall tail
2026-03-18 12:26:18 -07:00
openclaw voicecall latency # summarize turn latency from logs
2026-01-30 03:15:10 +01:00
openclaw voicecall expose --mode funnel
2026-01-12 01:16:46 +00:00
```
2026-03-18 12:26:18 -07:00
`latency` reads `calls.jsonl` from the default voice-call storage path. Use
`--file <path>` to point at a different log and `--last <n>` to limit analysis
to the last N records (default 200). Output includes p50/p90/p99 for turn
latency and listen-wait times.
2026-01-12 01:16:46 +00:00
## Agent tool
Tool name: `voice_call`
2026-01-12 21:40:22 +00:00
Actions:
2026-01-31 21:13:13 +09:00
2026-01-12 21:40:22 +00:00
- `initiate_call` (message, to?, mode?)
- `continue_call` (callId, message)
- `speak_to_user` (callId, message)
- `end_call` (callId)
- `get_status` (callId)
2026-01-12 01:16:46 +00:00
2026-01-13 17:16:02 +05:30
This repo ships a matching skill doc at `skills/voice-call/SKILL.md` .
2026-01-12 01:16:46 +00:00
## Gateway RPC
2026-01-12 21:40:22 +00:00
- `voicecall.initiate` (`to?` , `message` , `mode?` )
- `voicecall.continue` (`callId` , `message` )
- `voicecall.speak` (`callId` , `message` )
- `voicecall.end` (`callId` )
- `voicecall.status` (`callId` )