Twilio Changelog | May. 06, 2026
New Public Beta: V3 Batch Transcription Configuration API
What is Batch Transcription Configuration?
The Batch Transcription Configuration API enables you to transcribe completed call recordings with configurable speech-to-text engines, models, and language settings. Create reusable transcription configurations, then submit any recording for transcription — results are delivered via webhook with sentence-level detail including speaker separation, timestamps, and confidence scores.
API Endpoints
New Configuration API
Create, manage, and reuse transcription configurations.
Method | Endpoint | Description |
POST | /v2/Configurations/Transcription | Create a transcription configuration |
GET | /v2/Configurations/Transcription | List all configurations |
GET | /v2/Configurations/Transcription/{id} | Get a specific configuration |
PUT | /v2/Configurations/Transcription/{id} | Update a configuration |
DELETE | /v2/Configurations/Transcription/{id} | Delete a configuration |
New Transcription API (V3)
Submit recordings for transcription and check status.
Method | Endpoint | Description |
POST | /v3/Transcriptions | Submit a recording for transcription (returns 202) |
GET | /v3/Transcriptions/{id} | Check transcription status |
GET | /v3/Transcriptions | List transcriptions - to be released, fast follow. |
DELETE | /v3/Transcriptions/{id} | Delete a transcription - to be released, fast follow. |
Configuration Object
Each configuration specifies how a recording should be transcribed:
transcriptionEngine — Speech-to-text provider (deepgram, google, or twilio_managed)
speechModel — Model variant (nova-3, nova-2, chirp_2, or twilio_managed)
language — Language code for transcription (e.g., en-US, es-ES, de-DE)
participantDefaults — Audio channel → participant type mapping for speaker separation
transcriptionStatusCallback — Webhook URL + method for receiving completed transcripts
conversationConfigurationId — Optional link to a conversation configuration
Access & Enablement
Important: If you are a current V2 Conversational Intelligence customer, you must request the V3 account access flag to be enabled on your account. We did this in order to help our current production customers have a clear differentiation between the new V3 Conversation Intelligence and Batch Transcription Configuration.
Once enabled, the V3 Transcription APIs and the Console Configuration Wizard will become available under Transcriptions in your Twilio Console.
To request access, contact your Twilio account team or submit a request through the Twilio Console > Voice > Transcriptions page. Enablement is typically processed within 1 business day.
Supported Engines, Models & Languages
Engine | Models | Languages |
deepgram | nova-3, nova-2 | en-US, en-GB, en-AU, es-ES, es-US, es-MX, de-DE, fr-FR, it-IT, pt-BR, pt-PT, nl-NL, no-NO, pl-PL, sv-SE, da-DK, multi |
chirp_2 | en-US, en-GB, en-AU, es-ES, es-US, de-DE, fr-FR, it-IT, pt-BR, pt-PT, nl-NL, no-NO, pl-PL, sv-SE, da-DK | |
twilio_managed | twilio_managed | en-US, en-GB, en-AU, es-ES, es-US, es-MX, de-DE, fr-FR, it-IT, pt-BR, pt-PT, nl-NL, no-NO, pl-PL, sv-SE, da-DK, multi |
Note: google/chirp_2 does not support es-MX. Use es-ES or es-US for Spanish with Google. The twilio_managed engine automatically selects the best available model for your language.
Submitting a Transcription
POST /v3/Transcriptions with:
sourceId — Recording SID (RE...) of a completed call recording
transcriptionConfigurationId — ID of a previously created configuration
participants — Array of participant objects with type, address, and audioChannelIndex
Participant types: CUSTOMER, HUMAN_AGENT, AI_AGENT
Webhook Delivery
When transcription completes, a POST is sent to your configured callback URL containing the full transcript with:
Sentence-level segments with text content
Speaker/participant identification per sentence (via audio channel mapping)
Start and end timestamps (seconds) for each sentence
Word-level timestamps within each sentence
Confidence scores per sentence
Resolved configuration showing which engine/model/language was used
Participant metadata (type, address, channel)
Duration of the transcribed audio
Idempotency
The POST /v3/Transcriptions endpoint supports idempotent requests to prevent duplicate transcription processing during retries.
Header: Idempotency-Key
Format: UUIDv7
Behavior: Duplicate submissions with the same key return the original response instead of creating a new transcription
Transcription Status Lifecycle
Status | Description |
QUEUED | Transcription request accepted and queued for processing |
PROCESSING | Audio is being transcribed by the configured engine |
COMPLETED | Transcription finished successfully; webhook delivered with results |
FAILED | Transcription could not be completed (invalid audio, engine error, etc.) |
Integration with Recording Configuration & Conversation Intelligence
The V3 Transcription Configuration integrates across the Voice platform to enable automated end-to-end workflows:
Auto-Transcribe via Recording Configuration
Attach a V3 Transcription Configuration to your Recording Configuration to automatically transcribe recordings as they complete — no additional API call required. When a call ends and the recording is ready, the platform automatically submits it for transcription using your configured engine, model, and language settings.
Downstream Analysis with V3 Conversation Intelligence
Completed transcription results can be sent to the new V3 Conversation Intelligence platform for downstream analysis including sentiment detection, topic extraction, compliance monitoring, and custom operator evaluation. This creates a fully automated pipeline: Call → Recording → Transcription → Conversation Intelligence — configured once, executed automatically on every call.
Known Limitations (Beta)
Transcript sentence content is delivered via webhook only; the GET endpoint returns status and metadata but not sentence text
Google chirp_2 engine has longer processing times compared to Deepgram (minutes vs seconds)
Default configuration fallback is not yet available — a transcriptionConfigurationId must be provided on each submission
Maximum recording duration for transcription is subject to engine-specific limits
Getting Started
1. Create a transcription configuration via POST /v2/Configurations/Transcription specifying your preferred engine, model, language, and callback URL
2. Place a call and enable recording (dual-channel recommended for speaker separation)
3. Once the recording is complete (status: completed), submit it via POST /v3/Transcriptions with the recording SID and your configuration ID
4. Receive the completed transcript at your webhook URL with sentence-level detail
5. Optionally poll GET /v3/Transcriptions/{id} to check status before webhook delivery
Base URLs
Configuration API: https://voice.twilio.com/v2/Configurations/Transcription
Transcription API: https://voice.twilio.com/v3/Transcriptions
Authentication: HTTP Basic (Account SID : Auth Token)