What You’ll Learn
This guide shows you how to work with streaming responses from AI providers through Lava’s forward proxy. You’ll learn to:- Enable streaming for LLM requests
- Parse Server-Sent Events (SSE) in JavaScript/TypeScript
- Extract usage data from completed streams
- Handle provider-specific streaming formats (OpenAI vs Anthropic)
Streaming provides real-time UX. Instead of waiting for the entire response, users see tokens appear incrementally, creating a more interactive chat-like experience. Lava fully supports streaming for all LLM providers.
Enabling Streaming
Basic Streaming Request
Enable streaming by adding"stream": true to your request body:
How Lava Handles Streaming
- Request Detection: Lava detects
"stream": truein request body - Forward to Provider: Request forwarded to AI provider with streaming enabled
- Real-time Proxy: Lava streams response chunks back to your client in real-time
- Usage Extraction: Usage data extracted from final SSE message (provider-specific format)
- Billing Completion: Billing happens after stream completes, headers added to final chunk
Parsing Server-Sent Events
Using Fetch API (Recommended)
Modern approach usingfetch with response body reader:
Using EventSource (Browser Only)
For browser environments,EventSource provides simpler SSE handling:
Extracting Usage from Streams
Usage Headers (Final Chunk)
Lava adds usage headers to the final streamed chunk:Provider Usage Data (SSE Messages)
Some providers include usage in the final SSE message: OpenAI Format:Lava headers are authoritative. While provider SSE messages may include usage data, always use Lava’s
X-Lava-Usage-* headers for billing calculations. These reflect actual charges including merchant fees and service charges.Provider-Specific Considerations
OpenAI Streaming Format
Chunk Structure:- Content in
choices[0].delta.content - Final chunk includes
usageobject - Stream ends with
data: [DONE]
Anthropic Streaming Format
Chunk Structure:- Content in
delta.text(notdelta.content) - Multiple event types:
message_start,content_block_delta,message_stop - No
[DONE]marker
Google Gemini Streaming Format
Chunk Structure:- Content in
candidates[0].content.parts[0].text - Usage in
usageMetadata(appears in chunks, not just final)
React Integration Example
Troubleshooting
Stream cuts off or stops mid-response
Stream cuts off or stops mid-response
Common causes:
- Network timeout (connection dropped)
- Browser tab backgrounded (some browsers throttle background tabs)
- Provider rate limit hit mid-stream
- Wallet balance insufficient (stream terminates when funds depleted)
- Implement reconnection logic for network failures
- Keep tab active during streaming
- Check provider rate limits and add exponential backoff
- Monitor wallet balance before streaming requests
- Add error handlers for stream interruptions
Cannot parse SSE data
Cannot parse SSE data
Reasons:
- Incomplete chunks in buffer (line split mid-JSON)
- Provider-specific format differences (OpenAI vs Anthropic)
- Malformed JSON from provider (rare)
- Always buffer incomplete lines (see fetch API example)
- Check provider-specific
deltastructure - Wrap JSON.parse in try/catch to skip bad chunks
- Log unparseable data for debugging
Usage headers missing from response
Usage headers missing from response
Check:
- Headers only available AFTER stream completes (not during)
- Accessing
response.headersbeforereader.read()finishes - Header names are case-sensitive:
x-lava-request-id(lowercase)
Streaming works locally but not in production
Streaming works locally but not in production
Possible issues:
- Reverse proxy buffering responses (Nginx, Cloudflare)
- Edge functions timeout before stream completes
- CORS headers blocking stream in browser
- Compression middleware breaking SSE format
- Disable response buffering: Nginx
proxy_buffering off; - Use longer function timeouts for streaming routes
- Ensure CORS allows streaming:
Access-Control-Allow-Origin - Disable compression for SSE routes