Overview
Moonshot AI (Kimi) provides advanced language models with thinking capabilities, powered by a trillion-parameter Mixture-of-Experts (MoE) architecture, specializing in agentic reasoning and long-context processing. Key Features:- Thinking mode with multi-step reasoning capabilities
- 256K token context window for long documents
- Fully OpenAI-compatible API
- Tool calling support for agentic applications
- Competitive pricing with high performance
Authentication
Moonshot AI uses Bearer token authentication with the OpenAI-compatible format. Header:Popular Models (October 2025)
| Model | Context | Description | Key Feature |
|---|---|---|---|
| kimi-k2-thinking | 128K-256K | Advanced reasoning model with thinking mode | Multi-step reasoning with reasoning_content |
| kimi-k2-turbo-preview | 128K-256K | Fast inference variant | Optimized for speed and efficiency |
| Model | Input (Cache Hit) | Input (Cache Miss) | Output |
|---|---|---|---|
| kimi-k2-thinking | $0.15 | $0.60 | $2.50 |
| kimi-k2-turbo-preview | $0.15 | $1.15 | $8.00 |
Quick Start Example
Available Endpoints
Moonshot AI supports standard OpenAI-compatible endpoints:| Endpoint | Method | Description |
|---|---|---|
/v1/chat/completions | POST | Text generation with conversation context |
/v1/models | GET | List available models |
Additional Moonshot Endpoints: Moonshot’s API also offers
/v1/files (file upload/parsing) and /v1/completions (standard text completion) endpoints. These are not currently routed through Lava’s proxy. For file-based Q&A or document analysis, refer to Moonshot’s file API documentation.Usage Tracking
Usage data is returned in the response body (OpenAI format):data.usage
Format: Standard OpenAI usage object
Lava Tracking: Automatically tracked via x-lava-request-id header
Features & Capabilities
Thinking Mode
Thekimi-k2-thinking model provides multi-step reasoning with a dedicated reasoning_content field:
Tool Calling
Moonshot AI supports OpenAI-compatible function calling for agentic applications:tool_choice: "required" is not supported. Use "auto" or "none".
Streaming
Long Context (256K)
With up to 256K token context window, Kimi excels at:- Long document analysis
- Extensive code review
- Multi-turn conversations
- Research paper summarization
BYOK Support
Status: ✅ Supported (managed keys + BYOK) BYOK Implementation:- Append your Moonshot API key to the forward token:
${TOKEN}.${YOUR_MOONSHOT_KEY} - Lava tracks usage and billing while you maintain key control
- No additional Lava API key costs (metering-only mode available)
- Sign up at Moonshot AI Platform
- Navigate to API Keys section
- Create a new API key
- Use in Lava forward token (4th segment)
Best Practices
-
Model Selection:
- Use
kimi-k2-thinkingfor complex reasoning tasks requiring step-by-step analysis - Use
kimi-k2-turbo-previewfor faster inference and general chat applications
- Use
-
Thinking Mode Usage:
- Set
temperature: 1.0for thinking models to enable diverse reasoning paths - Extract both
reasoning_contentandcontentfor full understanding
- Set
-
Context Management:
- Leverage 256K context for long documents and extended conversations
- Use conversation history effectively for multi-turn interactions
-
Tool Calling:
- Use
tool_choice: "auto"for flexible function calling - Note that
"required"is not supported (use"auto"instead)
- Use
-
Temperature Settings:
- Thinking models: 1.0 for diverse reasoning
- Turbo models: 0.6-0.8 for balanced creativity and coherence