Skip to main content

Realtime

Use Realtime for low-latency, multimodal sessions with Realtime-capable models. Create a short-lived client secret from a trusted server, then connect to the Realtime WebSocket and exchange client and server events.

POST /v1/realtime/client_secrets
WSS /v1/realtime

Create a client secret

curl https://api.voxvey.com/v1/realtime/client_secrets \
-H "Authorization: Bearer $VOXVEY_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"session": {
"type": "realtime",
"model": "openai/gpt-realtime-2",
"instructions": "Keep responses brief.",
"modalities": ["text", "audio"],
"audio": {
"output": {
"voice": "marin"
}
}
}
}'

Example response shape:

{
"object": "realtime.client_secret",
"value": "vxy-realtime-client-secret-...",
"expires_at": 1780712161,
"session": {
"type": "realtime",
"model": "openai/gpt-realtime-2"
}
}

Connect

Use the returned client secret to connect to the Realtime WebSocket:

wss://api.voxvey.com/v1/realtime

After the connection opens, send Realtime client events such as session.update, conversation.item.create, and response.create, then listen for server events such as session.created, response.output_text.delta, and response.output_audio.delta.

Session update

{
"type": "session.update",
"session": {
"type": "realtime",
"model": "openai/gpt-realtime-2",
"instructions": "Keep responses brief.",
"audio": {
"output": {
"voice": "marin"
}
}
}
}

Common session fields

FieldTypeNotes
typestringUse realtime
modelstringProvider-prefixed Realtime model ID
instructionsstringOptional session-level behavior instructions
audio.output.voicestringOptional voice for generated audio
reasoning.effortstringOptional Realtime reasoning effort when supported
toolsarrayOptional tools exposed to the session

Notes

  • Use Realtime for interactive sessions where latency matters more than a single request-response exchange.
  • Create client secrets from a trusted server. Do not mint them directly from an untrusted browser without your own auth checks.
  • Use WebSocket sessions for realtime calling and interactive audio.
  • Use Chat Completions or Responses for standard text generation workflows.