Streaming Responses

What You’ll Learn

This guide shows you how to work with streaming responses from AI providers through Lava’s forward proxy. You’ll learn to:

Enable streaming for LLM requests
Parse Server-Sent Events (SSE) in JavaScript/TypeScript
Extract usage data from completed streams
Handle provider-specific streaming formats (OpenAI vs Anthropic)

Streaming provides real-time UX. Instead of waiting for the entire response, users see tokens appear incrementally, creating a more interactive chat-like experience. Lava fully supports streaming for all LLM providers.

Enabling Streaming

Basic Streaming Request

Enable streaming by adding "stream": true to your request body:

const response = await fetch('https://api.lavapayments.com/v1/forward/openai/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${forwardToken}`,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'gpt-4',
    messages: [
      { role: 'user', content: 'Explain quantum computing in simple terms' }
    ],
    stream: true  // Enable streaming
  })
});

// Response will be a stream of Server-Sent Events (SSE)

How Lava Handles Streaming

Request Detection: Lava detects "stream": true in request body
Forward to Provider: Request forwarded to AI provider with streaming enabled
Real-time Proxy: Lava streams response chunks back to your client in real-time
Usage Extraction: Usage data extracted from final SSE message (provider-specific format)
Billing Completion: Billing happens after stream completes, headers added to final chunk

Streaming adds no latency. Lava’s proxy forwards Server-Sent Events in real-time without buffering, maintaining the same performance as calling providers directly.

Parsing Server-Sent Events

Using Fetch API (Recommended)

Modern approach using fetch with response body reader:

async function streamCompletion(messages: Array<{ role: string; content: string }>) {
  const response = await fetch('https://api.lavapayments.com/v1/forward/openai/chat/completions', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${forwardToken}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      model: 'gpt-4',
      messages: messages,
      stream: true
    })
  });

  if (!response.ok) {
    throw new Error(`HTTP error! status: ${response.status}`);
  }

  const reader = response.body?.getReader();
  const decoder = new TextDecoder();

  if (!reader) {
    throw new Error('Response body is null');
  }

  let buffer = '';

  while (true) {
    const { done, value } = await reader.read();

    if (done) {
      break;
    }

    // Decode chunk and add to buffer
    buffer += decoder.decode(value, { stream: true });

    // Process complete SSE messages
    const lines = buffer.split('\n');
    buffer = lines.pop() || '';  // Keep incomplete line in buffer

    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const data = line.slice(6);  // Remove 'data: ' prefix

        if (data === '[DONE]') {
          console.log('Stream complete');
          continue;
        }

        try {
          const parsed = JSON.parse(data);
          const content = parsed.choices[0]?.delta?.content;

          if (content) {
            process.stdout.write(content);  // Print token incrementally
          }
        } catch (err) {
          console.error('Failed to parse SSE data:', err);
        }
      }
    }
  }
}

// Usage
await streamCompletion([
  { role: 'user', content: 'Write a haiku about code' }
]);

Using EventSource (Browser Only)

For browser environments, EventSource provides simpler SSE handling:

// Note: EventSource doesn't support custom headers or POST requests
// Use fetch API for Lava streaming (requires Authorization header)

// EventSource is NOT recommended for Lava due to header limitations
// Use fetch API approach shown above instead

Use fetch API, not EventSource. EventSource doesn’t support custom headers (needed for Authorization), making it unsuitable for authenticated Lava requests. Always use fetch with response body reader.

Extracting Usage from Streams

Usage Headers (Final Chunk)

Lava adds usage headers to the final streamed chunk:

async function streamWithUsage(messages: any[]) {
  const response = await fetch('https://api.lavapayments.com/v1/forward/openai/chat/completions', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${forwardToken}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      model: 'gpt-4',
      messages: messages,
      stream: true
    })
  });

  const reader = response.body?.getReader();
  const decoder = new TextDecoder();

  let finalChunkHeaders: Headers | null = null;
  let buffer = '';

  while (true) {
    const { done, value } = await reader!.read();

    if (done) {
      // Extract usage from response headers (available after stream completes)
      finalChunkHeaders = response.headers;
      break;
    }

    buffer += decoder.decode(value, { stream: true });
    // ... process chunks ...
  }

  // Access request tracking
  const requestId = finalChunkHeaders?.get('x-lava-request-id');

  console.log('Request tracking:', {
    requestId
  });
}

Provider Usage Data (SSE Messages)

Some providers include usage in the final SSE message: OpenAI Format:

data: {"id":"chatcmpl-xyz","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":15,"completion_tokens":120,"total_tokens":135}}

Anthropic Format:

data: {"type":"message_stop","usage":{"input_tokens":15,"output_tokens":120}}

Extracting provider usage:

for (const line of lines) {
  if (line.startsWith('data: ')) {
    const data = line.slice(6);

    if (data === '[DONE]') continue;

    const parsed = JSON.parse(data);

    // OpenAI usage
    if (parsed.usage) {
      console.log('Provider usage:', parsed.usage);
    }

    // Anthropic usage
    if (parsed.type === 'message_stop' && parsed.usage) {
      console.log('Provider usage:', parsed.usage);
    }
  }
}

Lava headers are authoritative. While provider SSE messages may include usage data, always use Lava’s X-Lava-Usage-* headers for billing calculations. These reflect actual charges including merchant fees and service charges.

Provider-Specific Considerations

OpenAI Streaming Format

Chunk Structure:

data: {"id":"chatcmpl-xyz","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

Key Differences:

Content in choices[0].delta.content
Final chunk includes usage object
Stream ends with data: [DONE]

Example:

const content = parsed.choices[0]?.delta?.content;
const finishReason = parsed.choices[0]?.finish_reason;

if (content) {
  displayToken(content);
}

if (finishReason === 'stop') {
  console.log('Generation complete');
}

Anthropic Streaming Format

Chunk Structure:

data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}

Key Differences:

Content in delta.text (not delta.content)
Multiple event types: message_start, content_block_delta, message_stop
No [DONE] marker

Example:

if (parsed.type === 'content_block_delta') {
  const content = parsed.delta?.text;
  if (content) {
    displayToken(content);
  }
}

if (parsed.type === 'message_stop') {
  console.log('Generation complete');
  const usage = parsed.usage;  // Final usage stats
}

Google Gemini Streaming Format

Chunk Structure:

data: {"candidates":[{"content":{"parts":[{"text":"Hello"}],"role":"model"}}],"usageMetadata":{"promptTokenCount":15,"candidatesTokenCount":5}}

Key Differences:

Content in candidates[0].content.parts[0].text
Usage in usageMetadata (appears in chunks, not just final)

Example:

const content = parsed.candidates?.[0]?.content?.parts?.[0]?.text;

if (content) {
  displayToken(content);
}

const usage = parsed.usageMetadata;
if (usage) {
  console.log('Cumulative usage:', usage);
}

React Integration Example

'use client';

import { useState } from 'react';

export function StreamingChat() {
  const [messages, setMessages] = useState<Array<{ role: string; content: string }>>([]);
  const [streaming, setStreaming] = useState(false);
  const [currentResponse, setCurrentResponse] = useState('');

  async function sendMessage(userMessage: string) {
    // Add user message
    const newMessages = [...messages, { role: 'user', content: userMessage }];
    setMessages(newMessages);
    setStreaming(true);
    setCurrentResponse('');

    try {
      const response = await fetch('https://api.lavapayments.com/v1/forward/openai/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': `Bearer ${process.env.NEXT_PUBLIC_FORWARD_TOKEN}`,
          'Content-Type': 'application/json'
        },
        body: JSON.stringify({
          model: 'gpt-4',
          messages: newMessages,
          stream: true
        })
      });

      const reader = response.body?.getReader();
      const decoder = new TextDecoder();
      let buffer = '';
      let fullResponse = '';

      while (true) {
        const { done, value } = await reader!.read();
        if (done) break;

        buffer += decoder.decode(value, { stream: true });
        const lines = buffer.split('\n');
        buffer = lines.pop() || '';

        for (const line of lines) {
          if (line.startsWith('data: ')) {
            const data = line.slice(6);
            if (data === '[DONE]') continue;

            try {
              const parsed = JSON.parse(data);
              const content = parsed.choices[0]?.delta?.content || '';

              if (content) {
                fullResponse += content;
                setCurrentResponse(fullResponse);
              }
            } catch (err) {
              // Skip parse errors
            }
          }
        }
      }

      // Add assistant response
      setMessages([...newMessages, { role: 'assistant', content: fullResponse }]);
      setCurrentResponse('');
    } catch (error) {
      console.error('Streaming error:', error);
    } finally {
      setStreaming(false);
    }
  }

  return (
    <div>
      {messages.map((msg, i) => (
        <div key={i} className={msg.role}>
          {msg.content}
        </div>
      ))}
      {streaming && currentResponse && (
        <div className="assistant streaming">
          {currentResponse}
          <span className="cursor">|</span>
        </div>
      )}
      <input
        onKeyDown={(e) => {
          if (e.key === 'Enter' && !streaming) {
            sendMessage(e.currentTarget.value);
            e.currentTarget.value = '';
          }
        }}
        disabled={streaming}
        placeholder="Type a message..."
      />
    </div>
  );
}

Troubleshooting

Stream cuts off or stops mid-response

Common causes:

Network timeout (connection dropped)
Browser tab backgrounded (some browsers throttle background tabs)
Provider rate limit hit mid-stream
Wallet balance insufficient (stream terminates when funds depleted)

Solutions:

Implement reconnection logic for network failures
Keep tab active during streaming
Check provider rate limits and add exponential backoff
Monitor wallet balance before streaming requests
Add error handlers for stream interruptions

Cannot parse SSE data

Reasons:

Incomplete chunks in buffer (line split mid-JSON)
Provider-specific format differences (OpenAI vs Anthropic)
Malformed JSON from provider (rare)

Solutions:

Always buffer incomplete lines (see fetch API example)
Check provider-specific delta structure
Wrap JSON.parse in try/catch to skip bad chunks
Log unparseable data for debugging

Usage headers missing from response

Check:

Headers only available AFTER stream completes (not during)
Accessing response.headers before reader.read() finishes
Header names are case-sensitive: x-lava-request-id (lowercase)

Solution:

// Wait for stream to complete
while (true) {
  const { done } = await reader.read();
  if (done) break;
  // ... process chunks
}

// NOW headers are available
const requestId = response.headers.get('x-lava-request-id');

Streaming works locally but not in production

Possible issues:

Reverse proxy buffering responses (Nginx, Cloudflare)
Edge functions timeout before stream completes
CORS headers blocking stream in browser
Compression middleware breaking SSE format

Solutions:

Disable response buffering: Nginx proxy_buffering off;
Use longer function timeouts for streaming routes
Ensure CORS allows streaming: Access-Control-Allow-Origin
Disable compression for SSE routes

What’s Next

Forward Proxy

Learn URL encoding and provider routing fundamentals

Multi-Provider

Abstract streaming logic for multiple providers

LLM Provider Reference

See all supported LLM providers and their features

Authentication

Generate forward tokens for streaming requests

Getting Started

Integration Guides

Core Concepts

Provider Reference

SDK Reference

What You’ll Learn

Enabling Streaming

Basic Streaming Request

How Lava Handles Streaming

Parsing Server-Sent Events

Using Fetch API (Recommended)

Using EventSource (Browser Only)

Extracting Usage from Streams

Usage Headers (Final Chunk)

Provider Usage Data (SSE Messages)

Provider-Specific Considerations

OpenAI Streaming Format

Anthropic Streaming Format

Google Gemini Streaming Format

React Integration Example

Troubleshooting

What’s Next

Forward Proxy

Multi-Provider

LLM Provider Reference

Authentication

Getting Started

Integration Guides

Core Concepts

Provider Reference

SDK Reference

​What You’ll Learn

​Enabling Streaming

​Basic Streaming Request

​How Lava Handles Streaming

​Parsing Server-Sent Events

​Using Fetch API (Recommended)

​Using EventSource (Browser Only)

​Extracting Usage from Streams

​Usage Headers (Final Chunk)

​Provider Usage Data (SSE Messages)

​Provider-Specific Considerations

​OpenAI Streaming Format

​Anthropic Streaming Format

​Google Gemini Streaming Format

​React Integration Example

​Troubleshooting

​What’s Next

Forward Proxy

Multi-Provider

LLM Provider Reference

Authentication

What You’ll Learn

Enabling Streaming

Basic Streaming Request

How Lava Handles Streaming

Parsing Server-Sent Events

Using Fetch API (Recommended)

Using EventSource (Browser Only)

Extracting Usage from Streams

Usage Headers (Final Chunk)

Provider Usage Data (SSE Messages)

Provider-Specific Considerations

OpenAI Streaming Format

Anthropic Streaming Format

Google Gemini Streaming Format

React Integration Example

Troubleshooting

What’s Next