Skip to contentSkip to navigationSkip to topbar
Page toolsOn this page
Looking for more inspiration?Visit the

Ingestion modes


Ingestion is the process of adding communications to conversations. Conversation Orchestrator supports three modes that you can combine: passive ingestion with capture rules, active ingestion through the API or TwiML, and the Conversations API (classic) bridge.

(warning)

Voice transcription fees

When you use Conversation Orchestrator with the voice channel, Twilio transcribes your calls to populate the conversation. Transcription fees apply. For pricing details, see Twilio Voice pricing(link takes you to an external page).


Mode overview

mode-overview page anchor
ModeHow it worksWhen to use
Passive (capture rules)Observe Twilio traffic and add matching communications to conversations.Monitor existing SMS, Voice, or WhatsApp traffic without code changes.
Active (API)Create conversations and participants programmatically.Outbound campaigns, agent desktops, and API integrations.
Active (TwiML)Route a voice call into a conversation from a TwiML verb.Voice-only control—specify which calls are captured and how they're grouped.
Conversations API (classic) bridgeSync an existing Conversations API (classic) service into Conversation Orchestrator.Classic customers who want Memory and Intelligence without rebuilding.

With passive ingestion, you define capture rules in your conversation configuration. Conversation Orchestrator observes your Twilio traffic and creates or updates a conversation whenever a rule matches.

Capture rule syntax

capture-rule-syntax page anchor

Each capture rule specifies a sender and recipient, plus optional channel-specific metadata:

1
{
2
"from": "+15551234567",
3
"to": "*",
4
"metadata": {}
5
}
  • from: the sender's address (phone number or channel identity).
  • to: the recipient's address.
  • *: wildcard that matches any address for that field. The rule still only applies when the other field matches.
  • metadata: optional channel-specific filters, such as callType for voice.

Make rules bidirectional to capture both inbound and outbound traffic—one rule with your Twilio address as the sender, one as the recipient.

Passive ingestion supports SMS, Voice, and WhatsApp. Chat is available only through the Conversations API (classic) bridge. For per-channel details including voice callType metadata, see Channels.


Active ingestion lets you create conversations directly, giving you full control over participants and timing. Use it for outbound campaigns, agent desktops, or any flow where you know the details upfront.

Call POST /v2/Conversations with the configuration ID and the participants you want. Always set participant type explicitly so profile resolution runs against the right address. For a full walkthrough, see Create conversations programmatically.

To route a voice call into Conversation Orchestrator without a capture rule, you can use <ConversationRelay> or <Transcription> in your TwiML. This is a good approach when one phone number serves multiple use cases—for example, a single IVR for sales and support—and you want different calls to land in different conversation configurations.

TwiML verbArchitectureUse case
<ConversationRelay>Synchronous bidirectional speech-to-text and text-to-speech over web sockets.AI voice agents.
<Transcription>Asynchronous fork of voice media for transcription.Human agent augmentation.

Both TwiML elements accept the conversationConfiguration parameter (to create or resolve a conversation) or conversationId (to join an existing conversation).

Parameter precedence

parameter-precedence page anchor

To understand how the conversationConfiguration and conversationId TwiML parameters work together, see the following table.

Parameters providedBehavior
conversationConfiguration onlyOrchestrator uses the configuration's grouping rules to find or create a conversation.
conversationId onlyRoutes directly to the specified conversation. Grouping rules are bypassed.
Both conversationConfiguration and conversationIdconversationId takes precedence. The configuration is ignored for routing.
NeitherNo Conversation Orchestrator integration. Transcription isn't captured into any conversation.

Human handoff: Conversation Relay to Transcription

human-handoff-conversation-relay-to-transcription page anchor

A common pattern is for the AI agent (Conversation Relay) to handle the call initially, then hand off to a human agent. At handoff, end the Conversation Relay session and start <Transcription> with the same conversationId:

1
<!-- After handoff to human agent -->
2
<Response>
3
<Start>
4
<Transcription conversationId="YOUR_CONVERSATION_ID" />
5
</Start>
6
<Dial>+15551234567</Dial>
7
</Response>

The full interaction—AI portion and human portion—lives in one conversation. You pay for Conversation Relay speech-to-text (STT) during the AI portion, then Real-Time Transcription STT during the human portion. Never both simultaneously on the same call.

(warning)

Don't combine capture rules with active TwiML for the same call

If your configuration has voice capture rules and you pass the conversationConfiguration parameter to a <ConversationRelay> element for the same call, you pay for STT twice: once through Conversation Relay and once through the Real-Time Transcription stream created by the capture rule. Remove voice capture rules from your configuration when using active TwiML with Conversation Relay.


Conversations API (classic) bridge

conversations-api-classic-bridge page anchor

The classic bridge syncs data from a Conversations API (classic) service into Conversation Orchestrator. Messages flow in both directions, your existing application keeps working, and you get access to Conversation Memory and Conversation Intelligence. It's also the only way to use the Chat channel with Conversation Orchestrator.

To set up a bridged configuration, see the Connect Conversations API (classic) guide.

Keep these behaviors in mind when using the bridge:

  • Each Conversations API (classic) conversation maps one-to-one to a Conversation Orchestrator conversation. Grouping inside classic follows classic's rules; grouping for non-classic channels follows conversationGroupingType.
  • Participants from bridged conversations include the classic ConversationSid in channelId.
  • Conversation open/closed status follows classic, not the configuration's statusTimeouts.
  • Each classic Service SID can be bridged by only one configuration per account.
  • Don't configure a capture rule and a classic bridge for the same traffic: you'll get duplicate communications.

Agents can reply through either the Conversations API (classic) endpoints or POST /v2/Conversations/{ConversationSid}/Communications on Conversation Orchestrator. Conversation Orchestrator resolves the classic ConversationSid internally and routes through classic.


Modes run in parallel under one configuration. The following combinations are common:

  • Outbound and inbound SMS. Capture inbound SMS passively and create outbound conversations actively through the API. Link both with GROUP_BY_PROFILE when the same customer appears on both.
  • Voice and classic bridge. Capture voice passively and bridge chat from classic, linked by GROUP_BY_PROFILE.


ResourceLimit
Configurations per account10
Capture rules per channel per configuration100

Every inbound event is evaluated against all capture rules across all configurations in your account. If you need passive capture for more than 100 phone numbers on a single channel, split across multiple configurations. Use wildcards ("from": "*") where possible to minimize rule count.


  • Make capture rules bidirectional. Catch both inbound and outbound traffic.
  • Don't overlap capture rules across configurations. Conflicting rules can capture the same traffic twice.
  • Use active TwiML for multi-use-case IVRs. Route each path to a different conversationConfiguration.
  • Use active TwiML instead of capture rules for CLIENT calls. CLIENT capture rules require specific identity strings and can't match a wildcard across dynamic identities.
  • Prefer GROUP_BY_PROFILE in production. GROUP_BY_PROFILE recognizes customers across devices and channels. Use address-based grouping only when you're not relying on profile-based recognition.