Inference - Lava Documentation

Inference offers 5 models through Lava’s AI Gateway, supporting Chat Completions. Authentication uses Authorization: Bearer. See the Inference API docs for provider-specific parameters.

Supports both managed (Lava’s API keys) and unmanaged (bring your own credentials) mode.

Quick Start

const response = await fetch('https://api.lava.so/v1/forward?u=https%3A%2F%2Fapi.inference.net%2Fv1%2Fchat%2Fcompletions', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    Authorization: `Bearer ${forwardToken}`,
  },
  body: JSON.stringify({
    model: 'google/gemma-3-27b-instruct/bf-16',
    messages: [{ role: "user", content: "Hello!" }],
  }),
});

Chat Completions

Target URL: https://api.inference.net/v1/chat/completions


Content Type	`application/json`
Streaming	Yes (set `stream: true` in request body)

Model	Input / 1M tokens	Output / 1M tokens
google/gemma-3-27b-instruct/bf-16	$0.30	$0.40
meta-llama/llama-3.2-11b-instruct/fp-16	$0.055	$0.055
meta-llama/llama-3.1-8b-instruct/fp-8	$0.03	$0.03
meta-llama/llama-3.2-3b-instruct/fp-16	$0.02	$0.02
meta-llama/llama-3.2-1b-instruct/fp-16	$0.01	$0.01

Next Steps

All Providers

Browse all supported AI providers

Forward Proxy

Learn how to construct proxy URLs and authenticate requests

⌘I

​Quick Start

​Chat Completions

​Next Steps

All Providers

Forward Proxy

Quick Start

Chat Completions

Next Steps