Authorization: Bearer. See the DeepInfra API docs for provider-specific parameters.
Supports both managed (Lava’s API keys) and unmanaged (bring your own credentials) mode.
Quick Start
Chat Completions
Target URL:https://api.deepinfra.com/v1/openai/chat/completions
| Content Type | application/json |
| Streaming | Yes (set stream: true in request body) |
| Model | Input / 1M tokens | Output / 1M tokens |
|---|---|---|
| deepseek-ai/DeepSeek-R1-Turbo | $1.00 | $3.00 |
| deepseek-ai/DeepSeek-R1-0528-Turbo | $1.00 | $3.00 |
| deepseek-ai/DeepSeek-V3-0324-Turbo | $1.00 | $3.00 |
| meta-llama/Meta-Llama-3.1-405B-Instruct | $0.80 | $0.80 |
| deepseek-ai/DeepSeek-R1 | $0.70 | $2.40 |
| deepseek-ai/DeepSeek-R1-0528 | $0.50 | $2.15 |
| deepseek-ai/DeepSeek-Prover-V2-671B | $0.50 | $2.18 |
| meta-llama/Llama-4-Maverick-17B-128E-Instruct-Turbo | $0.50 | $0.50 |
| microsoft/WizardLM-2-8x22B | $0.48 | $0.48 |
| Qwen/Qwen3-Coder-480B-A35B-Instruct | $0.40 | $1.60 |
| deepseek-ai/DeepSeek-V3 | $0.38 | $0.89 |
| meta-llama/Meta-Llama-3-70B-Instruct | $0.30 | $0.40 |
| MiniMaxAI/MiniMax-M2.5 | $0.30 | $1.20 |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | $0.27 | $0.27 |
| MiniMaxAI/MiniMax-M2.1 | $0.27 | $1.15 |
| MiniMaxAI/MiniMax-M2 | $0.27 | $1.15 |
| deepseek-ai/DeepSeek-V3-0324 | $0.25 | $0.88 |
| Qwen/Qwen3-235B-A22B-Thinking-2507 | $0.23 | $2.30 |
| meta-llama/Llama-3.3-70B-Instruct | $0.23 | $0.40 |
| meta-llama/Meta-Llama-3.1-70B-Instruct | $0.23 | $0.40 |
| Qwen/Qwen3-Coder-480B-A35B-Instruct-Turbo | $0.22 | $1.00 |
| deepseek-ai/DeepSeek-V3.1 | $0.21 | $0.79 |
| deepseek-ai/DeepSeek-V3.1-Terminus | $0.21 | $0.79 |
| deepseek-ai/DeepSeek-R1-Distill-Llama-70B | $0.20 | $0.60 |
| meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 | $0.15 | $0.60 |
| Qwen/Qwen3-235B-A22B | $0.13 | $0.60 |
| Qwen/Qwen2.5-72B-Instruct | $0.12 | $0.39 |
| nvidia/Llama-3.3-Nemotron-Super-49B-v1.5 | $0.10 | $0.40 |
| meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo | $0.10 | $0.28 |
| Qwen/Qwen3-32B | $0.10 | $0.30 |
| google/gemma-3-27b-it | $0.10 | $0.19 |
| Qwen/Qwen3-Next-80B-A3B-Instruct | $0.09 | $1.10 |
| meta-llama/Llama-4-Scout-17B-16E-Instruct | $0.08 | $0.30 |
| Qwen/Qwen3-30B-A3B | $0.08 | $0.29 |
| mistralai/Mixtral-8x7B-Instruct-v0.1 | $0.08 | $0.24 |
| Qwen/QwQ-32B | $0.075 | $0.15 |
| Qwen/Qwen3-235B-A22B-Instruct-2507 | $0.071 | $0.10 |
| microsoft/phi-4 | $0.07 | $0.14 |
| microsoft/phi-4-reasoning-plus | $0.07 | $0.35 |
| Gryphe/MythoMax-L2-13b | $0.065 | $0.065 |
| Qwen/Qwen3-14B | $0.06 | $0.24 |
| Qwen/Qwen2.5-Coder-32B-Instruct | $0.06 | $0.15 |
| meta-llama/Llama-Guard-4-12B | $0.05 | $0.05 |
| meta-llama/Llama-3.3-70B-Instruct-Turbo | $0.05 | $0.17 |
| google/gemma-3-12b-it | $0.05 | $0.10 |
| microsoft/Phi-4-multimodal-instruct | $0.05 | $0.10 |
| meta-llama/Llama-3.2-11B-Vision-Instruct | $0.049 | $0.049 |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2 | $0.04 | $0.16 |
| Qwen/Qwen2.5-7B-Instruct | $0.04 | $0.10 |
| meta-llama/Meta-Llama-3.1-8B-Instruct | $0.03 | $0.05 |
| meta-llama/Meta-Llama-3-8B-Instruct | $0.03 | $0.06 |
| mistralai/Mistral-7B-Instruct-v0.3 | $0.028 | $0.054 |
| google/gemma-3-4b-it | $0.02 | $0.04 |
| meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo | $0.016 | $0.021 |
| meta-llama/Llama-3.2-1B-Instruct | $0.005 | $0.01 |
| meta-llama/Llama-3.2-3B-Instruct | $0.003 | $0.006 |
Next Steps
All Providers
Browse all supported AI providers
Forward Proxy
Learn how to construct proxy URLs and authenticate requests