Twilio Changelog | Oct. 04, 2024

<Gather> New Multi-Provider Speech Recognition Models Public Beta now available

TLDR;  New Speech Models, Multi-Provider Speech Recognition Capabilities, and Latest STT API Versions Now Supported With New <Gather> Public Beta

<Gather>, the Twilio platform’s utterance-based Speech to Text (STT) capability, takes a significant step forward for voice app builders this week, by adding support for both i) latest Speech-to-Text API capabilities from Google, updating to V2 of their Speech APIs (including new and improved speech models), as well as ii) the ability for app builders – for the first time – to be able to choose an alternative provider of Speech Recognition, Deepgram and their speech models, for use in their Twilio “<Gather> input = speech,” TwiML calls. Developers can pick and choose speech rec providers and  models on the fly as may suit their application, use case, and even change that selection with each question/prompt. or processing of each caller’s individual spoken responses.

Whereas <Gather> is the first part of Twilio’s Speech Recognition portfolio to add Deepgram and the new Google API an speech models, other parts of the speech portfolio – e.g. Streaming Real-Time Transcriptions (RTT), and batch transcriptions with Voice Intelligence – will also be able to leverage the new speech models and providers with time as well.

How can we take advantage of these new <Gather> New Multi-Provider Speech Recognition Models' Beta capabilities?

Customers wishing to check out these new speech recognition capabilities in <Gather> with their TwiML voice applications have two options for how they can start doing so:  builders with existing <Gather>-using applications can either select in the Voice Settings Twilio Console page to use Google v2 STT APIs (instead of the current Google v1 default); or builders of new or existing voice applications can specify Google (as “googlev2”) or Deepgram (as “deepgram”)  for the provider in the “provider_speechmodel” parameter of their TwiML <Gather> input = speech code.

Customer benefits 

With these new Speech Recognition capabilities, providers, and new support of their latest STT API versions, Twilio expects to deliver industry-leading speech recognition accuracy and  improved noisy environment performance, offering builders choices from across a wider array of speech models suited to builder’s use cases, for longer answers or short utterances, ranging from customer services automations like form-filling and survey responses, to speaking naturally to LLM bots in IVRs/Virtual Agents, and more!

 

Voice Voice IVR and customer care

Additional Resources

Blog

Read more about our latest product updates, product tutorials, and community projects.


Docs

See API reference documentation, quickstarts, SDKs, and multi-language code samples.

Events

Find upcoming events and join us virtually or in person to learn more about our products.