Voice & Audio Providers
Lava supports speech-to-text, text-to-speech, and voice AI providers with character-based and duration-based billing.ElevenLabs
Primary Use: Text-to-Speech, Voice Cloning Key Features:- Natural voice synthesis with 29+ languages
- Voice cloning from audio samples
- Real-time streaming audio generation
- Multilingual support with emotion control
- Character-based metering
- Text-to-Speech:
https://api.elevenlabs.io/v1/text-to-speech/{voice_id} - Speech-to-Text:
https://api.elevenlabs.io/v1/speech-to-text
- Text-to-Speech: Character-based (per million characters)
- Speech-to-Text: Duration-based (per minute of audio)
Retell
Primary Use: AI-Powered Voice Phone Calls Key Features:- Conversational AI for phone calls
- Real-time responses during calls
- Webhook notifications for call events
- Duration-based billing
- Custom voice and personality configuration
- Create Call:
https://api.retellai.com/create-phone-call - Get Call:
https://api.retellai.com/get-call/{call_id}
- Duration-based (per minute of call time)
- Includes both inbound and outbound calls
Metering Details
Character-Based Metering (ElevenLabs TTS)
How it works:- Input text is measured by character count
- Pricing is per million characters (1M)
- Spaces, punctuation, and special characters count toward total
Duration-Based Metering (Retell, ElevenLabs STT)
How it works:- Audio duration measured in seconds/minutes
- Pricing is per minute or per second
- Includes processing and silence time
Audio Formats
Supported Input Formats (Speech-to-Text)
- MP3, WAV, FLAC, OGG
- Sample rates: 8kHz, 16kHz, 44.1kHz, 48kHz
- Mono or stereo
Output Formats (Text-to-Speech)
- ElevenLabs: MP3 (default), PCM, WebM
Voice Customization
ElevenLabs Voice Settings
Control voice characteristics:Voice Cloning
Create custom voices from audio samples:- Upload audio samples (min 1 minute of clear speech)
- Train voice model
- Use generated voice ID in TTS requests
- Billed per character of generated speech