Fireworks AI

Overview

Fireworks AI provides fast, reliable inference for open-source and proprietary language models with a developer-first approach and competitive pricing. Key Features:

Sub-second latency with optimized inference stack
100+ models including Llama, Mixtral, and proprietary options
Fully OpenAI-compatible API
Fine-tuning and model deployment capabilities

Official Documentation: Fireworks AI Docs

Authentication

Fireworks AI uses Bearer token authentication with the OpenAI-compatible format. Header:

Authorization: Bearer YOUR_FIREWORKS_API_KEY

Lava Forward Token:

${LAVA_SECRET_KEY}.${CONNECTION_SECRET}.${PRODUCT_SECRET}

For BYOK (Bring Your Own Key):

${LAVA_SECRET_KEY}.${CONNECTION_SECRET}.${PRODUCT_SECRET}.${YOUR_FIREWORKS_API_KEY}

Popular Models (October 2025)

Model	Context	Description	Use Case
accounts/fireworks/models/llama-v3p3-70b-instruct	128K	Meta’s Llama 3.3 flagship	General reasoning, coding
accounts/fireworks/models/mixtral-8x7b-instruct	32K	Mistral’s MoE model	Fast, balanced performance
accounts/fireworks/models/qwen2p5-72b-instruct	128K	Alibaba’s Qwen 2.5	Multilingual, math

Pricing: See Fireworks AI Pricing for current rates.

Quick Start Example

// 1. Set up your environment variables
const LAVA_FORWARD_TOKEN = process.env.LAVA_FORWARD_TOKEN;

// 2. Define the Fireworks AI endpoint
const FIREWORKS_ENDPOINT = 'https://api.fireworks.ai/inference/v1/chat/completions';

// 3. Make the request through Lava
const response = await fetch(
  `https://api.lavapayments.com/v1/forward?u=${encodeURIComponent(FIREWORKS_ENDPOINT)}`,
  {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${LAVA_FORWARD_TOKEN}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      model: 'accounts/fireworks/models/llama-v3p3-70b-instruct',
      messages: [
        {
          role: 'user',
          content: 'What makes a good AI inference platform?'
        }
      ],
      temperature: 0.7,
      max_tokens: 512
    })
  }
);

// 4. Parse response and extract usage
const data = await response.json();
console.log('Response:', data.choices[0].message.content);

// 5. Track usage (from response body)
const usage = data.usage;
console.log('Tokens used:', usage.total_tokens);

// 6. Get Lava request ID (from headers)
const requestId = response.headers.get('x-lava-request-id');
console.log('Lava Request ID:', requestId);

Available Endpoints

Fireworks AI supports OpenAI-compatible endpoints:

Endpoint	Method	Description
`/inference/v1/chat/completions`	POST	Text generation with conversation context
`/inference/v1/models`	GET	List available models

Usage Tracking

Usage data is returned in the response body (OpenAI format):

{
  "usage": {
    "prompt_tokens": 30,
    "completion_tokens": 200,
    "total_tokens": 230
  }
}

Location: data.usage Format: Standard OpenAI usage object Lava Tracking: Automatically tracked via x-lava-request-id header

Features & Capabilities

Streaming:

{
  "stream": true
}

JSON Mode:

{
  "response_format": { "type": "json_object" }
}

Function Calling:

{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "search",
        "description": "Search the web",
        "parameters": {
          "type": "object",
          "properties": {
            "query": { "type": "string" }
          }
        }
      }
    }
  ]
}

BYOK Support

Status: ✅ Supported (managed keys + BYOK) BYOK Implementation:

Append your Fireworks AI API key to the forward token: ${TOKEN}.${YOUR_FIREWORKS_KEY}
Lava tracks usage and billing while you maintain key control
No additional Lava API key costs (metering-only mode available)

Getting a Fireworks AI API Key:

Sign up at Fireworks AI Console
Navigate to API Keys section
Create a new API key
Use in Lava forward token (4th segment)

Best Practices

Model Selection: Use Llama 3.3 for reasoning, Mixtral for speed/cost, Qwen for multilingual
Model Names: Use full account paths (e.g., accounts/fireworks/models/llama-v3p3-70b-instruct)
Temperature: 0.7 for creative tasks, 0.1-0.3 for factual outputs
Context Management: Leverage 128K context models for long-form content
Error Handling: Fireworks returns standard OpenAI error formats

Performance Characteristics

Latency: Sub-second first-token latency for most models Throughput: Optimized for high-concurrency workloads Reliability: 99.9% uptime SLA for production deployments Use Cases:

Production chatbots
Content generation pipelines
Code assistance tools
Multi-modal applications

Getting Started

Integration Guides

Core Concepts

Provider Reference

SDK Reference

Overview

Authentication

Popular Models (October 2025)

Quick Start Example

Available Endpoints

Usage Tracking

Features & Capabilities

BYOK Support

Best Practices

Performance Characteristics

Additional Resources

Getting Started

Integration Guides

Core Concepts

Provider Reference

SDK Reference

​Overview

​Authentication

​Popular Models (October 2025)

​Quick Start Example

​Available Endpoints

​Usage Tracking

​Features & Capabilities

​BYOK Support

​Best Practices

​Performance Characteristics

​Additional Resources

Overview

Authentication

Popular Models (October 2025)

Quick Start Example

Available Endpoints

Usage Tracking

Features & Capabilities

BYOK Support

Best Practices

Performance Characteristics

Additional Resources