Twilio Changelog | Oct. 04, 2024
<Gather> New Multi-Provider Speech Recognition Models Public Beta now available
TLDR; New Speech Models, Multi-Provider Speech Recognition Capabilities, and Latest STT API Versions Now Supported With New <Gather> Public Beta <Gather>, the Twilio platform’s utterance-based Speech to Text (STT) capability, takes a significant step forward for voice app builders this week, by adding support for both i) latest Speech-to-Text API capabilities from Google, updating to V2 of their Speech APIs (including new and improved speech models), as well as ii) the ability for app builders – for the first time – to be able to choose an alternative provider of Speech Recognition, Deepgram and their speech models, for use in their Twilio “<Gather> input = speech,” TwiML calls. Developers can pick and choose speech rec providers and models on the fly as may suit their application, use case, and even change that selection with each question/prompt. or processing of each caller’s individual spoken responses. Whereas <Gather> is the first part of Twilio’s Speech Recognition portfolio to add Deepgram and the new Google API an speech models, other parts of the speech portfolio – e.g. Streaming Real-Time Transcriptions (RTT), and batch transcriptions with Voice Intelligence – will also be able to leverage the new speech models and providers with time as well. How can we take advantage of these new <Gather> New Multi-Provider Speech Recognition Models' Beta capabilities? Customers wishing to check out these new speech recognition capabilities in <Gather> with their TwiML voice applications have two options for how they can start doing so: builders with existing <Gather>-using applications can either select in the Voice Settings Twilio Console page to use Google v2 STT APIs (instead of the current Google v1 default); or builders of new or existing voice applications can specify Google (as “googlev2”) or Deepgram (as “deepgram”) for the provider in the “provider_speechmodel” parameter of their TwiML <Gather> input = speech code. With these new Speech Recognition capabilities, providers, and new support of their latest STT API versions, Twilio expects to deliver industry-leading speech recognition accuracy and improved noisy environment performance, offering builders choices from across a wider array of speech models suited to builder’s use cases, for longer answers or short utterances, ranging from customer services automations like form-filling and survey responses, to speaking naturally to LLM bots in IVRs/Virtual Agents, and more!
|
Learn More:
https://www.twilio.com/docs/voice/twiml/gather
https://www.twilio.com/en-us/blog/tips-speech-recognition-virtual-agent-voice-calling
https://www.twilio.com/en-us/changelog/realtime-transcriptions-is-public-beta
https://www.twilio.com/docs/voice/intelligence/api/transcript-resource