Voice Providers

Patter supports three voice AI architectures. Each offers different tradeoffs between latency, voice quality, and customization.

OpenAI Realtime (Default)

End-to-end voice processing powered by OpenAI’s Realtime API. Audio goes directly to OpenAI, which handles speech recognition, language understanding, and speech synthesis in a single round trip.

agent = phone.agent(
    system_prompt="You are a helpful assistant.",
    provider="openai_realtime",  # default
    model="gpt-4o-mini-realtime-preview",
    voice="alloy",
)

Audio Encoding

OpenAI Realtime handles audio encoding automatically based on your telephony provider:

Telephony Provider	Audio Format	Sample Rate
Twilio	G.711 mu-law	8 kHz
Telnyx	PCM 16-bit	16 kHz

Available Voices

"alloy", "echo", "fable", "onyx", "nova", "shimmer"

Requirements

openai_key in the Patter constructor (local mode)

ElevenLabs Conversational AI

Uses ElevenLabs’ Conversational AI platform for natural, expressive voices. Ideal when voice quality is the top priority.

agent = phone.agent(
    system_prompt="You are a warm and friendly concierge.",
    provider="elevenlabs_convai",
    voice="rachel",
)

Configuration

When using ElevenLabs ConvAI, you can configure additional provider-specific parameters through the agent:

Parameter	Description
`voice`	ElevenLabs voice ID or name (e.g., `"rachel"`, `"adam"`)
`model`	Model identifier for ElevenLabs

Requirements

elevenlabs_key in the Patter constructor (local mode)

Pipeline Mode

Build a custom voice pipeline by combining separate STT (speech-to-text) and TTS (text-to-speech) providers. This gives you full control over each stage of the audio processing chain.

agent = phone.agent(
    system_prompt="You are a helpful assistant.",
    provider="pipeline",
    stt=Patter.deepgram(api_key="dg_..."),
    tts=Patter.elevenlabs(api_key="el_...", voice="rachel"),
)

In pipeline mode, the on_message callback receives the transcribed text and returns the response to synthesize:

async def handle_message(event) -> str:
    return f"You said: {event['text']}. How can I help?"

await phone.serve(agent, on_message=handle_message)

Requirements

Pipeline mode requires both an STT and a TTS provider. If you don’t pass stt/tts explicitly, Patter falls back to deepgram_key and elevenlabs_key from the constructor.

STT Providers

Use these factory methods to configure speech-to-text:

Patter.deepgram()

stt = Patter.deepgram(api_key="dg_...", language="en")

Parameter	Type	Default	Description
`api_key`	`str`	required	Your Deepgram API key.
`language`	`str`	`"en"`	BCP-47 language code.

Patter.whisper()

stt = Patter.whisper(api_key="sk-...", language="en")

Parameter	Type	Default	Description
`api_key`	`str`	required	Your OpenAI API key.
`language`	`str`	`"en"`	BCP-47 language code.

TTS Providers

Use these factory methods to configure text-to-speech:

Patter.elevenlabs()

tts = Patter.elevenlabs(api_key="el_...", voice="rachel")

Parameter	Type	Default	Description
`api_key`	`str`	required	Your ElevenLabs API key.
`voice`	`str`	`"rachel"`	Voice name or ID.

Patter.openai_tts()

tts = Patter.openai_tts(api_key="sk-...", voice="alloy")

Parameter	Type	Default	Description
`api_key`	`str`	required	Your OpenAI API key.
`voice`	`str`	`"alloy"`	Voice name (`"alloy"`, `"echo"`, `"fable"`, `"onyx"`, `"nova"`, `"shimmer"`).

OpenAI TTS returns audio at 24 kHz. Patter automatically resamples it to 16 kHz for telephony compatibility.

Provider Comparison

Feature	OpenAI Realtime	ElevenLabs ConvAI	Pipeline
Latency	Lowest	Low	Medium
Voice quality	Good	Best	Configurable
Customization	Limited	Medium	Full
`on_message` callback	No	No	Yes
Requires AI key	OpenAI	ElevenLabs	STT + TTS keys

Complete Pipeline Example

import os
import asyncio
from dotenv import load_dotenv
from patter import Patter

load_dotenv()

phone = Patter(
    twilio_sid=os.environ["TWILIO_SID"],
    twilio_token=os.environ["TWILIO_TOKEN"],
    phone_number=os.environ["PHONE_NUMBER"],
    webhook_url=os.environ["WEBHOOK_URL"],
)

agent = phone.agent(
    system_prompt="You are a helpful assistant.",
    provider="pipeline",
    stt=Patter.deepgram(api_key=os.environ["DEEPGRAM_KEY"]),
    tts=Patter.elevenlabs(api_key=os.environ["ELEVENLABS_KEY"], voice="rachel"),
)

async def handle_message(event) -> str:
    user_text = event["text"]
    # Add your own LLM logic here
    return f"I heard you say: {user_text}"

async def main():
    await phone.serve(agent, on_message=handle_message, port=8000)

asyncio.run(main())

Documentation Index

​Voice Providers

​OpenAI Realtime (Default)

​Audio Encoding

​Available Voices

​Requirements

​ElevenLabs Conversational AI

​Configuration

​Requirements

​Pipeline Mode

​Requirements

​STT Providers

​Patter.deepgram()

​Patter.whisper()

​TTS Providers

​Patter.elevenlabs()

​Patter.openai_tts()

​Provider Comparison

​Complete Pipeline Example

Voice Providers

OpenAI Realtime (Default)

Audio Encoding

Available Voices

Requirements

ElevenLabs Conversational AI

Configuration

Requirements

Pipeline Mode

Requirements

STT Providers

Patter.deepgram()

Patter.whisper()

TTS Providers

Patter.elevenlabs()

Patter.openai_tts()

Provider Comparison

Complete Pipeline Example