Voice mode
Realtime voice conversations with agents, backed by OpenAI Realtime and per-agent voice profiles.
Voice mode lets you talk to an agent instead of typing. The runtime opens a realtime audio session, streams microphone input to the model, and plays the response back as it's generated.
What voice mode does
- Captures microphone audio in the browser.
- Streams it to the OpenAI Realtime API for STT and model response in a single roundtrip.
- Streams audio back for the agent's turn.
- Mirrors transcripts into the regular session as messages, so memory, tool calls, and canvas activity all work the same as in text mode.
- Optional separate transcription path for recording flows that don't need realtime audio out.
Per-agent voice profile
Every agent has a voice block in its settings:
| Field | Meaning |
|---|---|
provider | Currently openai. |
gender | male or female — used to pick a default voice if none is set. |
openaiVoice | One of the OpenAI Realtime built-in voices. |
autoAssigned | true when the platform picked the voice; false when the user did. |
The OpenAI Realtime built-in voices are: alloy, ash, ballad, coral, echo, sage, shimmer, verse, marin, cedar. Ala defaults to marin.
If an agent has no voice set, the runtime auto-assigns one matching the configured gender and marks autoAssigned: true. You can override on the agent settings page at any time.
Realtime tools and transcripts
The realtime runtime exposes a curated tool surface to the model and writes structured transcripts back into the session. Tool calls made over voice show up on the canvas and in the message stream identically to text-mode calls.
Telephony
The same voice plumbing powers inbound and outbound phone calls through the telephony service. Numbers can be routed to an agent; calls flow through Twilio or ElevenLabs depending on the provider configured for the number, and the OpenAI Realtime path is used for the conversational turn.
Enabling it
Voice mode is on by default for all agents. To start a voice session, open an agent and click the voice toggle. Browser permission to use the microphone is required. To change the voice, edit the agent's settings.