Realtime
Use Realtime for low-latency, multimodal sessions with Realtime-capable models. Create a short-lived client secret from a trusted server, then connect to the Realtime WebSocket and exchange client and server events.
POST /v1/realtime/client_secrets
WSS /v1/realtime
Create a client secret
curl https://api.voxvey.com/v1/realtime/client_secrets \
-H "Authorization: Bearer $VOXVEY_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"session": {
"type": "realtime",
"model": "openai/gpt-realtime-2",
"instructions": "Keep responses brief.",
"modalities": ["text", "audio"],
"audio": {
"output": {
"voice": "marin"
}
}
}
}'
Example response shape:
{
"object": "realtime.client_secret",
"value": "vxy-realtime-client-secret-...",
"expires_at": 1780712161,
"session": {
"type": "realtime",
"model": "openai/gpt-realtime-2"
}
}
Connect
Use the returned client secret to connect to the Realtime WebSocket:
wss://api.voxvey.com/v1/realtime
After the connection opens, send Realtime client events such as
session.update, conversation.item.create, and response.create, then listen
for server events such as session.created, response.output_text.delta, and
response.output_audio.delta.
Session update
{
"type": "session.update",
"session": {
"type": "realtime",
"model": "openai/gpt-realtime-2",
"instructions": "Keep responses brief.",
"audio": {
"output": {
"voice": "marin"
}
}
}
}
Common session fields
| Field | Type | Notes |
|---|---|---|
type | string | Use realtime |
model | string | Provider-prefixed Realtime model ID |
instructions | string | Optional session-level behavior instructions |
audio.output.voice | string | Optional voice for generated audio |
reasoning.effort | string | Optional Realtime reasoning effort when supported |
tools | array | Optional tools exposed to the session |
Notes
- Use Realtime for interactive sessions where latency matters more than a single request-response exchange.
- Create client secrets from a trusted server. Do not mint them directly from an untrusted browser without your own auth checks.
- Use WebSocket sessions for realtime calling and interactive audio.
- Use Chat Completions or Responses for standard text generation workflows.