You want Deepgram speech-to-text for audio attachments
You need a quick Deepgram config example
Deepgram
Deepgram (Audio Transcription)
Deepgram is a speech-to-text API. In OpenClaw it is used for inbound audio/voice note
transcription via tools.media.audio.
When enabled, OpenClaw uploads the audio file to Deepgram and injects the transcript
into the reply pipeline ({{Transcript}} + [Audio] block). This is not streaming;
it uses the pre-recorded transcription endpoint.
```json5
{
tools: {
media: {
audio: {
enabled: true,
models: [{ provider: "deepgram", model: "nova-3" }],
},
},
},
}
```
Send an audio message through any connected channel. OpenClaw transcribes it
via Deepgram and injects the transcript into the reply pipeline.
Authentication follows the standard provider auth order. `DEEPGRAM_API_KEY` is
the simplest path.
Override endpoints or headers with `tools.media.audio.baseUrl` and
`tools.media.audio.headers` when using a proxy.
Output follows the same audio rules as other providers (size caps, timeouts,
transcript injection).
Deepgram transcription is **pre-recorded only** (not real-time streaming). OpenClaw
uploads the complete audio file and waits for the full transcript before injecting
it into the conversation.
Related
Audio, image, and video processing pipeline overview.
Full config reference including media tool settings.
Common issues and debugging steps.
Frequently asked questions about OpenClaw setup.