Skip to contentSkip to navigationSkip to topbar
On this page

What is Call Transcription?

Call transcription is the conversion of a voice or video call audio track into written words to be stored as plain text in a conversational language. Call transcription can either be live - as a call or event happens - or based on the recording of a past conversation.

The Importance of Speech-to-Text Transcription

the-importance-of-speech-to-text-transcription page anchor

Call transcription is an important and powerful tool for business, training, medical, or legal reasons. As text has far more advanced search and analysis features available than audio, a text-based history of conversations is necessary (or superior) for many use cases. Additionally, real-time speech-to-text transcription services (such as Closed Captioning) are used to increase accessibility, improving understanding for people who are hard-of-hearing or new to a language.

Using Call Transcription In Your Business

using-call-transcription-in-your-business page anchor

When it comes to voice calls, call transcription is often used in a business context, for example, to improve training and feedback for call center employees(link takes you to an external page). Logging the context and words spoken in a call can help you identify business problems algorithmically, making it easier to deploy resources in an evidence-based manner. Additionally, call transcriptions and recordings are valuable for legal purposes, where contemporaneous transcriptions, recordings, and notes are superior to other types of records.

Twilio allows you to add call transcriptions to our Programmable Voice(link takes you to an external page) product. For recorded transcriptions, you can use our REST API's provisions to translate recordings to speech(link takes you to an external page). Twilio additionally has a real-time transcription service with multiple language support and contextual analysis and Natural Language Processing support. Talk to Sales(link takes you to an external page) about your call transcription requirements for information on that product.

Legality of Call Transcriptions

legality-of-call-transcriptions page anchor

Note that call transcription legality differs by locality. For some localities, transcribing recorded calls, recording calls or even transcribing real-time speech over a call or video is banned or requires informed consent by some or all parties in a conversation. Twilio cannot comment on the specifics of your local laws; you'll have to read the relevant laws or consult with your legal representation for your unique situation.

Dual Channel vs. Single Channel Recordings for Transcriptions

dual-channel-vs-single-channel-recordings-for-transcriptions page anchor

Because of differences in volume, accents, timing, and connection quality, the final mixed track of a voice or video call can often be unintelligible even for professional human transcribers. So-called Single-Channel Recordings only store the one final mixed track pre-transcription, which can vastly increase the eventual number of transcription errors - especially if participants are speaking at the same time.

Dual Channel Recording and Call Transcription Flow.

With the highest accuracy call transcription solutions, both (or all) sides of the call are recorded separately. With individual recordings, a Dual-Channel Recording solution (or Multi-Channel Recording solution) is superior for eliminating cross-talk and cancellation noise which would otherwise interfere with the final mix. It also prevents most (or all) misattribution errors.

See more about our dual-channel call transcription options, here(link takes you to an external page).

Getting Started With Call Transcription

getting-started-with-call-transcription page anchor

The Gather or Record TwiML Voice verbs both support eventual transcribing, while our Phone Call Speech Transcription Product can help you with your real-time requirements. Also, speak to sales(link takes you to an external page) about Natural Language Processing and determining caller intent or sentiment in real-time.