ConversationRelay, including the <ConversationRelay>
TwiML noun and API, uses artificial intelligence or machine learning technologies. By enabling or using any features or functionalities within Programmable Voice that Twilio identifies as using artificial intelligence or machine learning technology, you acknowledge and agree to certain terms. Your use of these features or functionalities is subject to the terms of the Predictive and Generative AI or ML Features Addendum.
ConversationRelay isn't compliant with the Payment Card Industry (PCI) and doesn't support Voice workflows that are subject to PCI.
ConversationRelay is currently available as a Public Beta product, and Twilio may change the information in this document at any time. This means that some features aren't yet implemented, and others may change before the product becomes Generally Available. Public Beta products aren't covered by a Twilio Service Level Agreement. Learn more about Twilio's beta product support here.
Before using ConversationRelay, you need to complete the onboarding steps and agree to the Predictive and Generative AI/ML Features Addendum. See the ConversationRelay Onboarding Guide for more details.
The <ConversationRelay>
TwiML noun under the <Connect>
verb routes a call to Twilio's ConversationRelay service, providing advanced AI-powered voice interactions. ConversationRelay handles the complexities of live, synchronous voice calls, such as Speech-to-Text (STT) and Text-to-Speech (TTS) conversions, session management, and low-latency communication with your application. This approach allows your system to focus on processing conversational AI logic and sending back responses effectively.
In a typical setup, <ConversationRelay>
connects to your AI application through a WebSocket, allowing real-time and event-based interaction. Your application receives transcribed caller speech in structured messages and sends responses as text, which ConversationRelay converts to speech and plays back to the caller. This setup is commonly used for customer service, virtual assistants, and other scenarios that require real-time, AI-based voice interactions.
Before you can use <ConversationRelay>
, make sure you've completed the onboarding steps and configured your Twilio account accordingly.
To ensure the secure operation of <ConversationRelay>
, your WebSocket server must validate incoming requests using the Twilio signature. For detailed guidance on setting up signature validation, see Configure your WebSocket server.
1const VoiceResponse = require('twilio').twiml.VoiceResponse;23const response = new VoiceResponse();4const connect = response.connect({5action: 'https://myhttpserver.com/connect_action'6});7connect.conversationRelay({8url: 'wss://mywebsocketserver.com/websocket',9welcomeGreeting: 'Hi! Ask me anything!'10});1112console.log(response.toString());
1<?xml version="1.0" encoding="UTF-8"?>2<Response>3<Connect action="https://myhttpserver.com/connect_action">4<ConversationRelay url="wss://mywebsocketserver.com/websocket" welcomeGreeting="Hi! Ask me anything!" />5</Connect>6</Response>
action
(optional): The URL that Twilio will request when the <Connect>
verb ends.url
(required): The URL of your WebSocket server (must use the wss://
protocol).welcomeGreeting
(optional): The message automatically played to the caller after we answer the call and establish the WebSocket connection.When the TwiML execution is complete, Twilio will make a callback to the action
URL with call information and the return parameters from ConversationRelay
.
The <ConversationRelay>
noun supports the following attributes:
Attribute name | Description | Default value | Required |
---|---|---|---|
url | The URL to your WebSocket server (must use wss:// ). | Required | |
welcomeGreeting | The message automatically played to the caller after we answer the call and establish the WebSocket connection. | Optional | |
welcomeGreetingInterruptible | Specifies if the caller can interrupt the welcomeGreeting with speech. Values can be "none" , "dtmf" , "speech" , or "any" . For backward compatibility, Boolean values are also accepted: true = "any" and false = "none" . | "any" | Optional |
language | The language code (for example, "en-US" ) that applies to both Speech-to-Text (STT) and Text-to-Speech (TTS). Setting this attribute is equivalent to setting both ttsLanguage and transcriptionLanguage . | "en-US" | Optional |
ttsLanguage | The default language code to use for TTS when the text token message doesn't specify a language. If you set both attributes, this one overrides the language attribute. You can modify this via the ttsLanguage field in the language message you send through the Service Provider Interface (SPI). | Optional | |
ttsProvider | The provider for TTS. Available choices are "Google" , "Amazon" , and "ElevenLabs" . | "Google" | Optional |
voice | The voice used for TTS. Choices vary based on the ttsProvider . For details, refer to the Twilio TTS Voices. We list additional voices available for ConversationRelay below. | "en-US-Journey-O" (Google), "Joanna-Neural" (Amazon) | Optional |
transcriptionLanguage | The language code to use for STT when the session starts. If you set both attributes, this one overrides the language attribute for the transcription language. You can modify this via the transcriptionLanguage field in the language message you send through the SPI. | Optional | |
transcriptionProvider | The provider for STT (Speech Recognition). Available choices are "Google" and "Deepgram" . | "Google" | Optional |
speechModel | The speech model used for STT. Choices vary based on the transcriptionProvider . Refer to the provider's documentation for an accurate list. | "telephony" (Google), "nova-2-general" (Deepgram) | Optional |
profanityFilter | Specifies whether to filter profanities out of the speech transcription. | "true" | Optional |
interruptible | Specifies if caller speech can interrupt TTS playback. Values can be "none" , "dtmf" , "speech" , or "any" . For backward compatibility, Boolean values are also accepted: true = "any" and false = "none" . | "any" | Optional |
dtmfDetection | Specifies whether the system sends Dual-tone multi-frequency (DTMF) keypresses over the WebSocket. Set to true to turn on DTMF events. | false | Optional |
preemptible | Specifies if the TTS of the current talk cycle can allow text tokens from the subsequent talk cycle to interrupt. | false | Optional |
hints | A comma-separated list of words or phrases that helps Speech-to-Text recognition for uncommon words, product names, or domain-specific terminology. Works similarly to the hints attribute in <Gather> . | Optional |
We've added TTS provider support for ElevenLabs, which provides additional natural-sounding voice synthesis. Use the interface below to search and filter through a wide selection of voices by language, accent, age, and more. Each voice entry includes a voiceID
that you can copy and paste into your <ConversationRelay>
configuration.
How to Use ElevenLabs Voices
voiceID
: From the search results, copy the unique identifier (for example, NYC9WEgkq1u4jiqBseQ9
).<ConversationRelay>
: In your TwiML, explicitly set ttsProvider="ElevenLabs"
and use the copied voiceID
in the voice
attribute.Example:
1<Connect>2<ConversationRelay url="wss://example.com/websocket" ttsProvider="ElevenLabs" voice="NYC9WEgkq1u4jiqBseQ9" ... />3</Connect>
Since Google is the default ttsProvider
, you must explicitly set ttsProvider="ElevenLabs"
to use an ElevenLabs voice.
If you don't explicitly specify the voice attribute in your <ConversationRelay>
configuration, ConversationRelay automatically applies a default voice based on the language setting (as defined by the language or ttsLanguage attribute) and the selected TTS provider (default is Google). Below is the complete list of default voice settings:
1{2"vi-VN": {"ttsProvider": "google", "voice": "vi-VN-Standard-A", "asrProvider": "google", "speechModel": "long"},3"ja-JP": {"ttsProvider": "google", "voice": "ja-JP-Standard-A", "asrProvider": "google", "speechModel": "telephony"},4"fi-FI": {"ttsProvider": "google", "voice": "fi-FI-Standard-A", "asrProvider": "google", "speechModel": "long"},5"uk-UA": {"ttsProvider": "google", "voice": "uk-UA-Standard-A", "asrProvider": "google", "speechModel": "long"},6"en-US": {"ttsProvider": "google", "voice": "en-US-Chirp3-HD-Aoede", "asrProvider": "google", "speechModel": "telephony"},7"en-IN": {"ttsProvider": "google", "voice": "en-IN-Standard-E", "asrProvider": "google", "speechModel": "long"},8"ta-IN": {"ttsProvider": "google", "voice": "ta-IN-Standard-A", "asrProvider": "google", "speechModel": "long"},9"nl-BE": {"ttsProvider": "google", "voice": "nl-BE-Standard-A", "asrProvider": "google", "speechModel": "telephony"},10"zh-CN": {"ttsProvider": "google", "voice": "zh-CN-Neural2-B", "asrProvider": "deepgram", "speechModel": "nova-2-general"},11"ar-XA": {"ttsProvider": "google", "voice": "ar-XA-Wavenet-D", "asrProvider": "google", "speechModel": "long"},12"te-IN": {"ttsProvider": "google", "voice": "te-IN-Standard-A", "asrProvider": "google", "speechModel": "long"},13"nl-NL": {"ttsProvider": "google", "voice": "nl-NL-Standard-A", "asrProvider": "google", "speechModel": "telephony"},14"hi-IN": {"ttsProvider": "google", "voice": "hi-IN-Standard-A", "asrProvider": "google", "speechModel": "long"},15"bg-BG": {"ttsProvider": "google", "voice": "bg-BG-Standard-A", "asrProvider": "google", "speechModel": "long"},16"en-AU": {"ttsProvider": "google", "voice": "en-AU-Standard-A", "asrProvider": "google", "speechModel": "telephony"},17"es-US": {"ttsProvider": "google", "voice": "es-US-Standard-A", "asrProvider": "google", "speechModel": "telephony"},18"kn-IN": {"ttsProvider": "google", "voice": "kn-IN-Standard-A", "asrProvider": "google", "speechModel": "long"},19"cs-CZ": {"ttsProvider": "google", "voice": "cs-CZ-Standard-A", "asrProvider": "google", "speechModel": "long"},20"de-DE": {"ttsProvider": "google", "voice": "de-DE-Standard-A", "asrProvider": "google", "speechModel": "telephony"},21"hu-HU": {"ttsProvider": "google", "voice": "hu-HU-Standard-A", "asrProvider": "google", "speechModel": "long"},22"ml-IN": {"ttsProvider": "google", "voice": "ml-IN-Standard-A", "asrProvider": "google", "speechModel": "long"},23"zh-TW": {"ttsProvider": "google", "voice": "zh-TW-Neural2-B", "asrProvider": "deepgram", "speechModel": "nova-2-general"},24"zh-HK": {"ttsProvider": "google", "voice": "zh-HK-Neural2-B", "asrProvider": "deepgram", "speechModel": "nova-2-general"},25"ko-KR": {"ttsProvider": "google", "voice": "ko-KR-Standard-B", "asrProvider": "google", "speechModel": "telephony"},26"pt-BR": {"ttsProvider": "google", "voice": "pt-BR-Standard-D", "asrProvider": "google", "speechModel": "telephony"},27"es-ES": {"ttsProvider": "google", "voice": "es-ES-Standard-A", "asrProvider": "google", "speechModel": "telephony"},28"fr-CA": {"ttsProvider": "google", "voice": "fr-CA-Standard-A", "asrProvider": "google", "speechModel": "telephony"},29"it-IT": {"ttsProvider": "google", "voice": "it-IT-Standard-A", "asrProvider": "google", "speechModel": "telephony"},30"pl-PL": {"ttsProvider": "google", "voice": "pl-PL-Standard-A", "asrProvider": "google", "speechModel": "long"},31"ru-RU": {"ttsProvider": "google", "voice": "ru-RU-Standard-A", "asrProvider": "google", "speechModel": "long"},32"pt-PT": {"ttsProvider": "google", "voice": "pt-PT-Standard-A", "asrProvider": "google", "speechModel": "telephony"},33"ro-RO": {"ttsProvider": "google", "voice": "ro-RO-Standard-A", "asrProvider": "google", "speechModel": "long"},34"sv-SE": {"ttsProvider": "google", "voice": "sv-SE-Standard-A", "asrProvider": "google", "speechModel": "long"},35"id-ID": {"ttsProvider": "google", "voice": "id-ID-Standard-A", "asrProvider": "google", "speechModel": "long"},36"mr-IN": {"ttsProvider": "google", "voice": "mr-IN-Standard-A", "asrProvider": "google", "speechModel": "long"},37"da-DK": {"ttsProvider": "google", "voice": "da-DK-Standard-A", "asrProvider": "google", "speechModel": "long"},38"tr-TR": {"ttsProvider": "google", "voice": "tr-TR-Standard-A", "asrProvider": "google", "speechModel": "long"},39"fr-FR": {"ttsProvider": "google", "voice": "fr-FR-Standard-A", "asrProvider": "google", "speechModel": "telephony"},40"en-GB": {"ttsProvider": "google", "voice": "en-GB-Standard-A", "asrProvider": "google", "speechModel": "telephony"},41"th-TH": {"ttsProvider": "google", "voice": "th-TH-Standard-A", "asrProvider": "google", "speechModel": "long"}42}
Our internal configuration defines these default settings and updates them periodically. Refer to the Twilio Twilio TTS Voices documentation for a complete and current list of supported languages, default voices, and detailed settings.
By understanding these defaults, you can decide when it's necessary to explicitly set the voice parameter to achieve the desired auditory experience for your application.
For additional voices from Google or Amazon (including generative options), refer to our Twilio TTS Voices documentation. Each provider offers a variety of languages and styles, enabling you to tailor your application's voice experience to your specific needs.
Include nested elements within <ConversationRelay>
for more granular configuration. For more information on configuring ConversationRelay, refer to the ConversationRelay Onboarding Guide.
The <Language>
element maps a language code to specific TTS and STT settings. Use this element to configure multiple languages for your session.
Example
1const VoiceResponse = require('twilio').twiml.VoiceResponse;23const response = new VoiceResponse();4const connect = response.connect();5const conversationrelay = connect.conversationRelay({6url: 'wss://mywebsocketserver.com/websocket'7});8conversationrelay.language({9code: 'sv-SE',10ttsProvider: 'amazon',11voice: 'Elin-Neural',12transcriptionProvider: 'google',13speechModel: 'long'14});15conversationrelay.language({16code: 'en-US',17ttsProvider: 'google',18voice: 'en-US-Journey-O'19});2021console.log(response.toString());
1<?xml version="1.0" encoding="UTF-8"?>2<Response>3<Connect>4<ConversationRelay url="wss://mywebsocketserver.com/websocket">5<Language code="sv-SE" ttsProvider="amazon" voice="Elin-Neural" transcriptionProvider="google" speechModel="long"/>6<Language code="en-US" ttsProvider="google" voice="en-US-Journey-O" />7</ConversationRelay>8</Connect>9</Response>
Attributes
Attribute name | Description of attributes | Default value | Required |
---|---|---|---|
code | The language code (for example, "en-US" ) that applies to both STT and TTS. | Required | |
ttsProvider | The provider for TTS. Choices are "Google" , "Amazon" , and "ElevenLabs" . | Inherited from <ConversationRelay> | Optional |
voice | The voice used for TTS. Choices vary based on the ttsProvider . | Inherited from <ConversationRelay> | Optional |
transcriptionProvider | The provider for STT. Choices are "Google" and "Deepgram" . | Inherited from <ConversationRelay> | Optional |
speechModel | The speech model used for STT. Choices vary based on the transcriptionProvider . | Inherited from <ConversationRelay> | Optional |
language | The language code for the session (for example, "en-US" ). | "en-US" | Optional |
customParameter | Custom parameters to be sent in the setup message. | Optional |
Notes
<ConversationRelay>
and <Language>
, the settings in <Language>
take precedence.ConversationRelay
provides default settings for commonly used languages.The <Parameter>
element allows you to send custom parameters from the TwiML directly into the initial "setup" message sent over the WebSocket. These parameters appear under the customParameters
field in the JSON message.
Example
1const VoiceResponse = require('twilio').twiml.VoiceResponse;23const response = new VoiceResponse();4const connect = response.connect();5const conversationrelay = connect.conversationRelay({6url: 'wss://mywebsocketserver.com/websocket'7});8conversationrelay.parameter({9name: 'foo',10value: 'bar'11});12conversationrelay.parameter({13name: 'hint',14value: 'Annoyed customer'15});1617console.log(response.toString());
1<?xml version="1.0" encoding="UTF-8"?>2<Response>3<Connect>4<ConversationRelay url="wss://mywebsocketserver.com/websocket">5<Parameter name="foo" value="bar"/>6<Parameter name="hint" value="Annoyed customer"/>7</ConversationRelay>8</Connect>9</Response>
Resulting Setup Message
1{2"type": "setup",3"sessionId": "VX00000000000000000000000000000000",4"callSid": "CA00000000000000000000000000000000",5"...": "...",6"customParameters": {7"foo": "bar",8"hint": "Annoyed customer"9}10}
Language settings refer to configurations for both Text-to-Speech and Speech-to-Text:
ttsLanguage
ttsProvider
voice
transcriptionLanguage
transcriptionProvider
speechModel
Configure language settings in two places:
<ConversationRelay>
: These serve as the default settings used when the session starts.<Language>
Elements: Each <Language>
element configures settings for a specific language code. You can include multiple <Language>
elements to support multiple languages.<ConversationRelay>
, the ttsLanguage
attribute overrides the language
attribute for the default TTS language.<ConversationRelay>
, the transcriptionLanguage
attribute overrides the language
attribute for the STT language.<Language>
element specifies the same code
attribute as in <ConversationRelay>
, the <Language>
element's settings take precedence.Default Values
language
: Defaults to en-US
if not specified.ttsProvider
: Defaults to Google
if not specified.transcriptionProvider
: Defaults to Google
if not specified.ttsProvider
attribute without the voice
attribute, the system uses a default voice for that provider.transcriptionProvider
attribute without the speechModel
attribute, the system uses a default model for that provider.voice
attribute without the ttsProvider
attribute, the system infers the provider from the default or specified ttsProvider
.speechModel
attribute without the transcriptionProvider
attribute, the system infers the provider from the default or specified transcriptionProvider
.For Speech-to-Text (STT) settings:
transcriptionLanguage
attribute to initiate the STT session.transcriptionProvider
and speechModel
attributes is invalid, the call disconnects, and the system reports an error in the action callback and error notifications.transcriptionLanguage
attribute during the session via the language
message you send through the Service Provider Interface (SPI).For Text-to-Speech (TTS) settings:
lang
property is present in the text
token message from the SPI, the service uses it to select the TTS voice.ttsProvider
and voice
attributes is invalid, the system sends an error message over the SPI.lang
property in the text
token, the service uses the current TTS language settings.ConversationRelay
interacts with your application server via a WebSocket connection specified by the url
attribute. Messages exchanged follow this Service Provider Interface (SPI) specification.
ConversationRelay validates all incoming SPI messages to ensure they conform to the expected format. If validation fails, Twilio returns error 64107 with details about the validation failure. The following validation rules apply:
token
field can't be null
or missing.lang
is provided, it must be one of the supported languages.source
field must contain a valid URL.digits
field can't be null
or empty.digits
field must only contain the characters 0-9, w, #, and *.ttsLanguage
or transcriptionLanguage
must be present.ttsLanguage
must be one of the supported languages.transcriptionLanguage
must be one of the supported languages.ConversationRelay validates messages but continues the session even when it returns an error 64107 for non-conforming requests. These validation messages are informative only.
ConversationRelay sends this message immediately after establishing the WebSocket connection.
1{2"type": "setup",3"sessionId": "VX00000000000000000000000000000000",4"callSid": "CA00000000000000000000000000000000",5"from": "+14151234567",6"to": "+18881234567",7"direction": "inbound",8"...": "...",9"customParameters" : {10"foo": "bar"11}12}
ConversationRelay sends this message when the caller says something.
1{2"type": "prompt",3"voicePrompt": "Hi! Can you tell me about life?",4"lang": "en-US",5"last": true6}
ConversationRelay sends this message when you turn on DTMF detection and the caller presses a key.
1{2"type": "dtmf",3"digit": "1"4}
ConversationRelay sends this message when the caller interrupts TTS playback by speaking.
1{2"type": "interrupt",3"utteranceUntilInterrupt": "Life is a complex set of",4"durationUntilInterruptMs": "460"5}
ConversationRelay sends this message when an error occurs during the session.
1{2"type": "error",3"description": "Invalid message received: { \"foo\" : \"bar\" }"4}
Send text tokens, and ConversationRelay converts them into speech.
1{2"type": "text",3"token": "Hello world!",4"last": false5}
token
attribute (Required): Converts the provided text to speech.last
attribute (Optional, default is false
): Indicates whether this is the last token in the current message.Best practices
"last": true
when you have sent the final token of a message.Request to play media to the caller.
1{2"type": "play",3"source": "https://api.twilio.com/cowbell.mp3",4"loop": 1,5"preemptible": false6}
source
attribute (Required): The URL of the media to play.loop
attribute (Optional, default is 1
): Number of times to play the media. A value of 0
plays it 1,000 times (maximum).preemptible
attribute (Optional, default is false
): If set to true
, subsequent text
or play
messages will stop this media playback.Request to send DTMF digits to the caller. ConversationRelay sends digits as per Twilio's <Play>
digits
attribute.
1{2"type": "sendDigits",3"digits": "9www4085551212"4}
Change the transcription and TTS language during the session.
1{2"type": "language",3"ttsLanguage": "sv-SE",4"transcriptionLanguage": "en-US"5}
This affects future TTS and STT sessions.
End the session and return control of the call to Twilio through ConversationRelay
.
1{2"type": "end",3"handoffData": "{\"reasonCode\":\"live-agent-handoff\", \"reason\": \"The caller wants to talk to a real person\"}"4}
handoffData
attribute (Optional): A string containing data to pass back in the action callback.When an action
URL is specified in the <Connect>
verb, ConversationRelay
will make a request to that URL when the <Connect>
verb ends. The request includes call information and session details.
Example Payloads
1{2"AccountSid": "AC00000000000000000000000000000000",3"CallSid": "CA00000000000000000000000000000000",4"CallStatus": "in-progress",5"From": "client:caller",6"To": "test:conversationrelay",7"Direction": "inbound",8"ApplicationSid": "AP00000000000000000000000000000000",9"SessionId": "VX00000000000000000000000000000000",10"SessionStatus": "ended",11"SessionDuration": "25",12"HandoffData": "{\"reason\": \"The caller requested to talk to a real person\"}"13}
1{2"AccountSid": "AC00000000000000000000000000000000",3"CallSid": "CA00000000000000000000000000000000",4"CallStatus": "in-progress",5"From": "client:caller",6"To": "test:conversationrelay",7"Direction": "inbound",8"ApplicationSid": "AP00000000000000000000000000000000",9"SessionId": "VX00000000000000000000000000000000",10"SessionStatus": "failed",11"SessionDuration": "10",12"ErrorCode": "39001",13"ErrorMessage": "Network connection to WebSocket server failed."14}
1{2"AccountSid": "AC00000000000000000000000000000000",3"CallSid": "CA00000000000000000000000000000000",4"CallStatus": "completed",5"From": "client:caller",6"To": "test:conversationrelay",7"Direction": "inbound",8"ApplicationSid": "AP00000000000000000000000000000000",9"SessionId": "VX00000000000000000000000000000000",10"SessionStatus": "completed",11"SessionDuration": "35"12}
"last": true
when the message is complete.error
messages sent over the SPI to handle any issues promptly.language
message to switch languages dynamically during a session.end
message to gracefully end sessions when your application logic determines it's appropriate.last=true
. This enables ConversationRelay to identify the first sayable string.last
flag properly
"last": true
."last": true
, ConversationRelay assumes additional tokens are forthcoming and may stop reading at the first punctuation mark (for example, period, comma, or question mark).In streaming mode, send each text token incrementally with "last": false
.
When the LLM indicates that the response is complete (for example, when response.finish_reason()
equals "stop"
), mark that final token with "last": true
.
Example:
1{ "type": "text", "token": "Hello", "last": false }2{ "type": "text", "token": " world", "last": false }3{ "type": "text", "token": "!", "last": true }
In non-streaming mode, when the entire response is generated as a single complete sentence, mark the token with "last": true
.
Example:
{ "type": "text", "token": "Hello world!", "last": true }
"last": true
.By following these guidelines, you can ensure that ConversationRelay processes and speaks your full message smoothly without unexpected pauses or truncation.
When setting up system prompts for Large Language Models (LLMs) in ConversationRelay
, consider these best practices to ensure optimal performance with Text-to-Speech (TTS) in ConversationRelay:
These prompt adjustments help improve the LLM-generated tokens' compatibility with voice output in ConversationRelay, enhancing clarity and consistency for users.
All WebSocket messages from ConversationRelay to your API follow the strict formats defined in these docs. Your application must also adhere to these specifications when sending messages back to ConversationRelay.
Following these practices helps maintain session stability and ensures compatibility with ConversationRelay's message handling.
When working with Text-to-Speech (TTS) in ConversationRelay, proper text normalization is crucial for delivering clear and natural spoken responses. This is especially important when using ElevenLabs voices, which may have difficulty with certain formats. Consider the following guidelines:
For detailed text normalization guidelines, refer to ElevenLabs' text normalization best practices.
Text-to-Speech (TTS) voice quality varies significantly by provider and voice type. While generative voices often offer higher fidelity and more natural-sounding responses, they can introduce additional latency and process TTS at a slower rate.
Selecting the right TTS voice involves balancing quality and performance, so thorough testing is essential before production deployment.
Speech-to-Text (STT) quality and latency can vary depending on the provider and the environment. Google and Deepgram each offer unique strengths for different scenarios, such as clean versus noisy audio environments.
Optimizing STT performance requires careful selection based on environment and model capabilities, so thorough testing is essential for achieving the best results.
In the event of a WebSocket connection error in ConversationRelay
, implement reconnection logic by initiating a new <Connect><ConversationRelay>
request:
ConversationRelay
: If you lose the WebSocket connection, handle the disconnect in your <Connect>
element's action
URL callback by returning new TwiML containing <Connect><ConversationRelay>
to restore the session.ConversationRelay
: Ensure the callSid remains the same to confirm continuity of the original call session.This approach helps maintain session stability and consistency following any connection disruptions.
When you switch languages during a session via an SPI message, ConversationRelay uses the pre-set voice associated with the new language that you configured in the initial TwiML setup. If you didn't configure a specific voice for that language, ConversationRelay will use its default voice for the selected language.
This setup ensures consistent voice behavior for each language by configuring it in TwiML before the call begins.
ConversationRelay supports both streaming and non-streaming modes for sending LLM responses. Each mode has unique trade-offs in latency and response fluidity:
For errors, such as messages that ConversationRelay doesn't understand, we will respond with an error message.
If your WebSocket sends unidentified messages to ConversationRelay and the last 10 messages remain unidentified, we will terminate the connection. The status code will be 1007 with the reason "Too many consecutive malformed messages." In that case, we will report an error 64105 "WebSocket Ended."
If the WebSocket disconnects unexpectedly in ConversationRelay, we don't reconnect, and the call disconnects with a failed
status.
ConversationRelay
, including the <ConversationRelay>
TwiML nouns and APIs, use artificial intelligence or machine learning technologies.
Our AI Nutrition Facts for ConversationRelay
provide an overview of the AI feature you're using, so you can better understand how the AI is working with your data. The below AI Nutrition Label details the ConversationRelay AI qualities. For more information and the glossary regarding the AI Nutrition Facts Label, refer to our AI Nutrition Facts page.
ConversationRelay uses the Default Base Model provided by the Model Vendor. The Base Model is not trained using Customer Data.
ConversationRelay uses the Default Base Model provided by the Model Vendor. The Base Model is not trained using Customer Data.
Base Model is not trained using any Customer Data.
Customer Data is not stored or retained in the Base Model.
Customer can view and listen to the input and output in the customer's own terminal.
Compliance
Customer can view and listen to the input and output in the customer's own terminal.
Customer can view and listen to the input and output in the customer's own terminal.
Customer is responsible for human review.
Learn more about this label at nutrition-facts.ai
ConversationRelay uses the Default Base Model provided by the Model Vendor. The Base Model is not trained using Customer Data.
ConversationRelay uses the Default Base Model provided by the Model Vendor. The Base Model is not trained using Customer Data.
Base Model is not trained using any Customer Data.
Customer Data is not stored or retained in the Base Model.
Customer can view and listen to the input and output in the customer's own terminal.
Compliance
Customer can view and listen to the input and output in the customer's own terminal.
Customer can view and listen to the input and output in the customer's own terminal.
Customer is responsible for human review.
Learn more about this label at nutrition-facts.ai
ConversationRelay uses the Default Base Model provided by the Model Vendor. The Base Model is not trained using Customer Data.
ConversationRelay uses the Default Base Model provided by the Model Vendor. The Base Model is not trained using Customer Data.
Base Model is not trained using any Customer Data.
Customer Data is not stored or retained in the Base Model.
Customer can view and listen to the input and output in the customer's own terminal.
Compliance
Customer can view and listen to the input and output in the customer's own terminal.
Customer can view and listen to the input and output in the customer's own terminal.
Customer is responsible for human review.
Learn more about this label at nutrition-facts.ai
The Base Model is not trained using any Customer Data.
Programmable Voice uses the default Base Model provided by the Model Vendor. The Base Model is not trained using customer data.
Base Model is not trained using any Customer Data.
The Base Model is not trained using any Customer Data.
Customers can view text input and listen to the audio output.
Compliance
Customers can view text input and listen to the audio output.
Customers can view text input and listen to the audio output.
Customer is responsible for human review.
Learn more about this label at nutrition-facts.ai