Voice & Audio Providers

Lava supports speech-to-text, text-to-speech, and voice AI providers with character-based and duration-based billing.

ElevenLabs

Primary Use: Text-to-Speech, Voice Cloning Key Features:

Natural voice synthesis with 29+ languages
Voice cloning from audio samples
Real-time streaming audio generation
Multilingual support with emotion control
Character-based metering

Endpoints:

Text-to-Speech: https://api.elevenlabs.io/v1/text-to-speech/{voice_id}
Speech-to-Text: https://api.elevenlabs.io/v1/speech-to-text

Usage Example (Text-to-Speech):

const response = await fetch(
  'https://api.lavapayments.com/v1/forward?u=' +
  encodeURIComponent('https://api.elevenlabs.io/v1/text-to-speech/21m00Tcm4TlvDq8ikWAM'),
  {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${forwardToken}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      text: "Hello, this is a test of ElevenLabs text-to-speech.",
      model_id: "eleven_monolingual_v1",
      voice_settings: {
        stability: 0.5,
        similarity_boost: 0.5
      }
    })
  }
);

// Response is audio/mpeg stream
const audioBuffer = await response.arrayBuffer();

Billing:

Text-to-Speech: Character-based (per million characters)
Speech-to-Text: Duration-based (per minute of audio)

Supported Languages: English, Spanish, French, German, Polish, Italian, Portuguese, Hindi, Arabic, Chinese, Japanese, Korean, Dutch, Turkish, Swedish, Indonesian, Filipino, Ukrainian, Greek, Czech, Finnish, Romanian, Russian, Danish, Bulgarian, Malay, Slovak, Croatian, Tamil

Retell

Primary Use: AI-Powered Voice Phone Calls Key Features:

Conversational AI for phone calls
Real-time responses during calls
Webhook notifications for call events
Duration-based billing
Custom voice and personality configuration

Endpoints:

Create Call: https://api.retellai.com/create-phone-call
Get Call: https://api.retellai.com/get-call/{call_id}

Usage Example:

const response = await fetch(
  'https://api.lavapayments.com/v1/forward?u=' +
  encodeURIComponent('https://api.retellai.com/create-phone-call'),
  {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${forwardToken}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      from_number: "+1234567890",
      to_number: "+0987654321",
      override_agent_id: "agent_abc123",
      retell_llm_dynamic_variables: {
        customer_name: "John Doe"
      }
    })
  }
);

Billing:

Duration-based (per minute of call time)
Includes both inbound and outbound calls

Metering Details

Character-Based Metering (ElevenLabs TTS)

How it works:

Input text is measured by character count
Pricing is per million characters (1M)
Spaces, punctuation, and special characters count toward total

Example:

Text: "Hello, world!" (13 characters)
Cost: 13 / 1,000,000 × rate per 1M characters

Duration-Based Metering (Retell, ElevenLabs STT)

How it works:

Audio duration measured in seconds/minutes
Pricing is per minute or per second
Includes processing and silence time

Example:

Call Duration: 3 minutes 27 seconds (207 seconds)
Cost: (207 / 60) × rate per minute

Audio Formats

Supported Input Formats (Speech-to-Text)

MP3, WAV, FLAC, OGG
Sample rates: 8kHz, 16kHz, 44.1kHz, 48kHz
Mono or stereo

Output Formats (Text-to-Speech)

ElevenLabs: MP3 (default), PCM, WebM

Voice Customization

ElevenLabs Voice Settings

Control voice characteristics:

{
  "voice_settings": {
    "stability": 0.5,        // 0-1: Lower = more expressive
    "similarity_boost": 0.75, // 0-1: Higher = closer to original voice
    "style": 0.0,            // 0-1: Exaggeration of speaking style
    "use_speaker_boost": true // Enhance clarity
  }
}

Voice Cloning

Create custom voices from audio samples:

Upload audio samples (min 1 minute of clear speech)
Train voice model
Use generated voice ID in TTS requests
Billed per character of generated speech

Real-Time Streaming

ElevenLabs Streaming TTS

Generate audio in real-time as text is provided:

const response = await fetch(
  'https://api.lavapayments.com/v1/forward?u=' +
  encodeURIComponent('https://api.elevenlabs.io/v1/text-to-speech/{voice_id}/stream'),
  {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${forwardToken}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      text: "Long text to stream...",
      model_id: "eleven_monolingual_v1"
    })
  }
);

// Audio chunks streamed as generated
const reader = response.body.getReader();
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  // Play audio chunk
}

Getting Started

Integration Guides

Core Concepts

Provider Reference

SDK Reference

Voice & Audio Providers