Skip to main content

Voice & Audio Providers

Lava supports speech-to-text, text-to-speech, and voice AI providers with character-based and duration-based billing.

ElevenLabs

Primary Use: Text-to-Speech, Voice Cloning Key Features:
  • Natural voice synthesis with 29+ languages
  • Voice cloning from audio samples
  • Real-time streaming audio generation
  • Multilingual support with emotion control
  • Character-based metering
Endpoints:
  • Text-to-Speech: https://api.elevenlabs.io/v1/text-to-speech/{voice_id}
  • Speech-to-Text: https://api.elevenlabs.io/v1/speech-to-text
Usage Example (Text-to-Speech):
const response = await fetch(
  'https://api.lavapayments.com/v1/forward?u=' +
  encodeURIComponent('https://api.elevenlabs.io/v1/text-to-speech/21m00Tcm4TlvDq8ikWAM'),
  {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${forwardToken}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      text: "Hello, this is a test of ElevenLabs text-to-speech.",
      model_id: "eleven_monolingual_v1",
      voice_settings: {
        stability: 0.5,
        similarity_boost: 0.5
      }
    })
  }
);

// Response is audio/mpeg stream
const audioBuffer = await response.arrayBuffer();
Billing:
  • Text-to-Speech: Character-based (per million characters)
  • Speech-to-Text: Duration-based (per minute of audio)
Supported Languages: English, Spanish, French, German, Polish, Italian, Portuguese, Hindi, Arabic, Chinese, Japanese, Korean, Dutch, Turkish, Swedish, Indonesian, Filipino, Ukrainian, Greek, Czech, Finnish, Romanian, Russian, Danish, Bulgarian, Malay, Slovak, Croatian, Tamil

Retell

Primary Use: AI-Powered Voice Phone Calls Key Features:
  • Conversational AI for phone calls
  • Real-time responses during calls
  • Webhook notifications for call events
  • Duration-based billing
  • Custom voice and personality configuration
Endpoints:
  • Create Call: https://api.retellai.com/create-phone-call
  • Get Call: https://api.retellai.com/get-call/{call_id}
Usage Example:
const response = await fetch(
  'https://api.lavapayments.com/v1/forward?u=' +
  encodeURIComponent('https://api.retellai.com/create-phone-call'),
  {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${forwardToken}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      from_number: "+1234567890",
      to_number: "+0987654321",
      override_agent_id: "agent_abc123",
      retell_llm_dynamic_variables: {
        customer_name: "John Doe"
      }
    })
  }
);
Billing:
  • Duration-based (per minute of call time)
  • Includes both inbound and outbound calls

Metering Details

Character-Based Metering (ElevenLabs TTS)

How it works:
  • Input text is measured by character count
  • Pricing is per million characters (1M)
  • Spaces, punctuation, and special characters count toward total
Example:
Text: "Hello, world!" (13 characters)
Cost: 13 / 1,000,000 × rate per 1M characters

Duration-Based Metering (Retell, ElevenLabs STT)

How it works:
  • Audio duration measured in seconds/minutes
  • Pricing is per minute or per second
  • Includes processing and silence time
Example:
Call Duration: 3 minutes 27 seconds (207 seconds)
Cost: (207 / 60) × rate per minute

Audio Formats

Supported Input Formats (Speech-to-Text)

  • MP3, WAV, FLAC, OGG
  • Sample rates: 8kHz, 16kHz, 44.1kHz, 48kHz
  • Mono or stereo

Output Formats (Text-to-Speech)

  • ElevenLabs: MP3 (default), PCM, WebM

Voice Customization

ElevenLabs Voice Settings

Control voice characteristics:
{
  "voice_settings": {
    "stability": 0.5,        // 0-1: Lower = more expressive
    "similarity_boost": 0.75, // 0-1: Higher = closer to original voice
    "style": 0.0,            // 0-1: Exaggeration of speaking style
    "use_speaker_boost": true // Enhance clarity
  }
}

Voice Cloning

Create custom voices from audio samples:
  1. Upload audio samples (min 1 minute of clear speech)
  2. Train voice model
  3. Use generated voice ID in TTS requests
  4. Billed per character of generated speech

Real-Time Streaming

ElevenLabs Streaming TTS

Generate audio in real-time as text is provided:
const response = await fetch(
  'https://api.lavapayments.com/v1/forward?u=' +
  encodeURIComponent('https://api.elevenlabs.io/v1/text-to-speech/{voice_id}/stream'),
  {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${forwardToken}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      text: "Long text to stream...",
      model_id: "eleven_monolingual_v1"
    })
  }
);

// Audio chunks streamed as generated
const reader = response.body.getReader();
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  // Play audio chunk
}

Next Steps