Inference.net offers 15 models through Lava’s AI Gateway, supporting Chat Completions. Authentication uses Authorization: Bearer. See the Inference.net API docs for provider-specific parameters.
Supports both managed API keys (from Lava) and BYOK mode.
Quick Start
const response = await fetch('https://api.lavapayments.com/v1/forward?u=https%3A%2F%2Fapi.inference.net%2Fv1%2Fchat%2Fcompletions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: `Bearer ${forwardToken}`,
},
body: JSON.stringify({
model: 'deepseek/deepseek-r1/fp-8',
messages: [{ role: "user", content: "Hello!" }],
}),
});
Chat Completions
Target URL: https://api.inference.net/v1/chat/completions
| |
|---|
| Content Type | application/json |
| Streaming | Yes (set stream: true in request body) |
| Model | Input / 1M tokens | Output / 1M tokens |
|---|
| deepseek/deepseek-r1-0528/fp-8 | $0.50 | $3.00 |
| deepseek/deepseek-r1/fp-8 | $0.45 | $2.70 |
| deepseek/deepseek-v3-0324/fp-8 | $0.45 | $1.45 |
| meta-llama/llama-3.1-70b-instruct/fp-16 | $0.30 | $0.40 |
| meta-llama/llama-3.3-70b-instruct/fp-16 | $0.30 | $0.40 |
| google/gemma-3-27b-instruct/bf-16 | $0.30 | $0.40 |
| qwen/qwen2.5-7b-instruct/bf-16 | $0.20 | $0.20 |
| deepseek/r1-distill-llama-70b/fp-8 | $0.10 | $0.40 |
| qwen/qwen3-30b-a3b/fp8 | $0.08 | $0.29 |
| meta-llama/llama-3.2-11b-instruct/fp-16 | $0.06 | $0.06 |
| mistralai/mistral-nemo-12b-instruct/fp-8 | $0.04 | $0.10 |
| meta-llama/llama-3.1-8b-instruct/fp-8 | $0.03 | $0.03 |
| meta-llama/llama-3.1-8b-instruct/fp-16 | $0.02 | $0.03 |
| meta-llama/llama-3.2-3b-instruct/fp-16 | $0.02 | $0.02 |
| meta-llama/llama-3.2-1b-instruct/fp-16 | $0.01 | $0.01 |
Next Steps