DEVELOPER API

Kimi API - Kimi K2.5 API

Build apps with Kimi using a fast, OpenAI style API. Send chat requests, connect your own tools, and power everything from support bots to research assistants. Choose K2 for reliable text + coding, or K2.5 for visual workflows like screenshot-to-code.

View Docs Get API Key

Chat + Tool Calling

Use a familiar chat-completions workflow, then add tools (functions) so Kimi can search, fetch data, run actions, and return structured outputs.

K2 vs K2.5 Models

Pick the right engine for your job: K2 for text-first agents and coding, K2.5 for multimodal tasks like UI screenshots, visual coding, and richer automation.

Comparison table (Kimi API vs other APIs)

What you care about	Kimi API	OpenAI API	Anthropic API	Gemini API	Mistral API	Cohere API	Groq API	OpenRouter
API style / compatibility	OpenAI-SDK compatible for many APIs; migration guide exists	Native `/v1/chat/completions` and newer `Responses` API	Messages API (`/v1/messages`)	`generateContent` / streaming variants	Chat endpoints + tool calling controls (`tool_choice`)	Chat endpoint + tool use	OpenAI-compatible endpoints (`/openai/v1/chat/completions`)	OpenAI Chat API normalized across many providers/models
Tool / function calling	Tool Use supported	Function calling + built-in tools in Responses	Tool use supported (Messages)	Function calling supported	Tool calling / function calling supported	Tool use patterns (parallel, multi-step)	Depends on model; OpenAI-compatible interface	Depends on underlying model; router doesn’t remove need for tool logic
Long context	Platform highlights up to 256K context	Model-dependent; documented per model	Model-dependent	Model-dependent; long-context docs exist	Model-dependent	Model-dependent	Model-dependent	Model-dependent (varies per routed model)
Multimodal (images/audio/video)	Model-dependent (K2.5 quickstart exists)	Supports text+image in Responses; more modalities via platform features	Vision supported (per Messages docs)	Explicit multimodal support (text, images, audio, etc.)	Has multimodal models + “agents” docs mention multimodal availability	Mostly text-focused; tool+citations strengths	Depends on model offered via Groq	Depends on model/provider you choose
Best “why choose it”	OpenAI-compatible feel + tool calling + long context	Deep platform features + broad docs + built-in tools	Strong tooling patterns + Messages-first API	Strong multimodal ecosystem + Google infra	Clean tool calling controls + growing agent features	Tool use + citations workflows are first-class	Very fast OpenAI-compatible inference focus	One endpoint, many models + routing/fallback

Kimi API (Complete Developer Guide)

If you’ve used modern LLM APIs before, the Kimi API will feel familiar because it’s designed to be compatible with popular chat-completions style workflows while still offering some very “Kimi-specific” strengths: long context, agentic tool calling, and (with newer models) multimodal + visual coding workflows. The goal of this guide is to help you go from “I heard Kimi is good” to “I shipped something stable in production.”

Kimi is developed by Moonshot AI and offered through their Open Platform. The Open Platform provides an OpenAI-compatible base URL for API calls (plus region-specific alternatives), a model list endpoint, file endpoints, and documentation for tool calling and rate limiting.

What is the Kimi API?

The Kimi API is the set of endpoints you call to use Kimi models inside your own products apps, bots, websites, internal tools, automation pipelines, and developer workflows. In practice, the most common flow looks like this:

You generate an API key in the Moonshot Open Platform console.
Your server sends a request to the chat completions endpoint using that key.
You optionally attach tools/functions, files, or other structured inputs so Kimi can do agent-style work.

If you already have code for the OpenAI Chat Completions pattern, it’s often straightforward to migrate by changing the base URL + model names and adjusting a few small differences.

Why developers choose Kimi API

People typically pick Kimi for one (or more) of these reasons:

1) Long-context workflows that feel practical

Kimi is known for handling very large inputs, which matters when you’re building:

Document summarizers,
“Chat with docs” tools,
Research agents that hold lots of context,
Codebase assistants that need multiple files at once.

Kimi’s K2 / K2.5 family is also frequently described with very large context windows in official and ecosystem documentation.

2) Tool calling designed for agentic tasks

Tool calling (sometimes called function calling) is a big deal if you want Kimi to:

Call your search API,
Query a database,
Fetch web pages,
Run calculations,
Create multi-step plans and execute them.

Kimi K2 documentation explicitly highlights tool calling capability and shows patterns that match the “tools array” style many developers already use.

3) Modern coding and “visual coding” direction

Kimi’s newer positioning (especially around K2.5) puts a spotlight on design-to-code and agent swarm ideas, where the model can coordinate multi-step work and handle rich office outputs.

How Kimi API is structured

Think of the Kimi API as three layers:

Layer A - Transport + Auth

HTTPS requests
JSON request/response
Authorization via API key (Bearer token in the Authorization header)

Layer B - Core endpoints

/v1/chat/completions (the main one)
/v1/models (discover model IDs)
/v1/files (upload and reference documents)

Layer C - Product behaviors

Tool calling / function calling
Long context + document workflows
Multimodal support (model-dependent)
Rate limits and quotas that shape production behavior

Base URL options (global, China, and overseas routing)

One of the easiest ways to avoid headaches is to use the right base URL for where your servers run.

Commonly documented options include:

Global base URL: https://api.moonshot.ai/v1
China base URL: https://api.moonshot.cn/v1
Overseas suggestion (help/FAQ): some guidance recommends an alternate base URL for overseas calling (example: a Singapore route).

Practical tip: pick one base URL, set it in config (env var), and keep it consistent across your services. If you deploy multi-region, treat base URL as part of per-region config.

Getting an API key (safe, production-friendly)

Step 1: Create an account + open the console

The Moonshot Open Platform console is where you manage billing, projects, and keys.

Step 2: Generate the key

Create a key, copy it once, and store it in a password manager or secrets vault. Many guides note keys often start with a prefix like sk-.

Step 3: Store it properly

Do:

Store in server-side env vars (MOONSHOT_API_KEY)
Use a secrets manager (AWS Secrets Manager, GCP Secret Manager, Vault, etc.)
Rotate keys per environment (dev/staging/prod)

Don’t:

Ship keys in frontend JavaScript
Commit keys to Git
Paste keys into public bug reports

Your first request (Chat Completions)

The Kimi platform documents a Chat Completions endpoint under:

POST https://api.moonshot.ai/v1/chat/completions

cURL example

Notes:

The model name above is an example. Use /v1/models to discover the exact IDs your account can call.
Kimi K2’s own documentation recommends a temperature like 0.6 for the instruct variant in many cases.

Discovering models

Why you should always call `/v1/models`

Model naming changes. New variants appear. Some models are region- or account-dependent. The clean solution is: fetch the model list at startup (or cache it daily) so your UI can display available models.

The Moonshot docs mention:

GET https://api.moonshot.ai/v1/models

How to pick models in a real product

Instead of picking “the best” model globally, pick by job:

Fast chat / customer support: prefer an “instant” or “turbo” style variant if offered (lower latency, cheaper).
Deep reasoning / complex planning: use a “thinking” variant if available.
Design → code / screenshots / UI analysis: use a multimodal model (for Kimi, K2.5 is positioned strongly here).
Agentic tool calling: choose a model that reliably emits tool calls and handles long multi-step runs.

OpenAI-compatible behavior

A lot of developers don’t want to learn “another completely new API shape.” Kimi leans into that reality.

Kimi K2’s GitHub docs state that Moonshot provides an OpenAI/Anthropic-compatible API via the Open Platform.
Moonshot’s “migrating from OpenAI” guide also lists endpoints compatible with OpenAI (including chat completions and file endpoints).

What this means in practice

If you already have something like:

You can often do something like:

And then keep using chat.completions.create(...) style calls—changing model IDs and handling any small behavioral differences.

Tool calling (function calling) with Kimi

Tool calling is how you turn an LLM from “text generator” into “doer.”

The basic pattern

You send a request with a list of tools (JSON schema).
The model chooses to call one (or many).
Your code executes the tool(s).
You append tool results to the conversation.
You call the model again until it returns a final answer.

Kimi K2 documentation shows an end-to-end tool calling loop with tools and tool_choice="auto".

Example tool schema (weather tool)

Production tips for tool calling

Validate tool arguments (never blindly trust LLM output).
Add timeouts and retries for external calls.
Return small tool outputs when possible (avoid flooding context).
Keep a tool audit log (for debugging and safety review).

Files: uploading documents for “chat with docs”

Kimi’s platform docs include a files API.

This is useful when you want to:

Attach PDFs,
Analyze reports,
Summarize contracts,
Extract tables,
Build doc-driven workflows.

A reliable UX flow for files

Upload file → get a file_id
Store file_id in your DB
Reference it in later chat requests
Cache summaries to reduce repeated token spend

Design note: Don’t make the user re-upload the same file for every question. That’s slow and expensive.

Rate limits and reliability planning

Rate limits aren’t a “billing detail.” They shape architecture.

Moonshot documentation describes multiple measurement styles such as:

Concurrency
RPM (requests per minute)
TPM (tokens per minute)
TPD (tokens per day)

There’s also documentation stating rate limits can depend on cumulative recharge / account level.

How to design around rate limits (without pain)

Queue requests (don’t burst blindly)
Backoff on HTTP 429 (exponential, with jitter)
Use streaming for better UX (if supported)
Cache repeated prompts (especially system prompts and repeated instructions)
Add a “fallback model” in your app for graceful degradation

Cost control

Even if you don’t care about pennies, you should care about predictability.

The most effective cost levers

Prompt discipline: keep system prompts tight and reusable.
Summarize long history: don’t send 50-message threads forever.
Use retrieval: fetch only relevant doc chunks instead of whole documents.
Cache: repeated questions + repeated context = wasted tokens.
Right-size the model: don’t use a heavy “thinking” model for short tasks.

Moonshot provides pricing documentation for chat inference and limits (and those pages can change), so it’s best to link your product’s cost logic to live model pricing rather than hardcoding numbers forever.

Practical use cases

1) Customer support copilot

Model drafts responses
Tools fetch order status, refunds, delivery ETA
Human agent approves / edits

Why Kimi fits: tool calling + long context for policy docs.

2) Research assistant (internal)

Users ask: “Summarize this report + give risks + cite pages”
Files API stores PDFs
Tools call search or your internal wiki

Why Kimi fits: long context + file workflows.

3) Developer helper (codebase Q&A)

Tools read repo files
Model explains architecture, writes patches, generates tests
Optional agentic loops for multi-step fixes

Why Kimi fits: agentic tool calling patterns are explicitly supported and discussed in K2 materials.

4) Design-to-code pipeline

User uploads screenshot or mock
Model outputs HTML/CSS/JS scaffolding
Follow-up prompts refine spacing, responsiveness, accessibility

K2.5 is positioned around “visual coding” and agentic productivity workflows.

Best practices checklist

Security

API key never in browser code
Keys stored in secrets manager
Separate keys for dev/staging/prod
Rotate keys every 30–90 days
Log redaction (never log Authorization headers)

Reliability

Retry + exponential backoff
Handle 429 and 5xx separately
Timeouts on tool calls
Circuit breaker on external APIs
Fallback model strategy

Quality

Short, consistent system prompt
Constrain outputs (JSON schema / formatting rules)
Add verification steps for high-stakes tasks
Keep tool outputs small + structured
Evaluate with a test set (not vibes)

Common pitfalls

Pitfall 1: “It worked in cURL, but fails in production”

Usually caused by:

wrong base URL for your region
proxy blocking
missing headers
timeouts on long requests

Use official base URLs and follow region guidance when calling from overseas.

Pitfall 2: Tool calling loops that never end

Fix it by:

max tool-call steps (e.g., 5–12)
detect repeating tool calls with same args
force a “final answer” after N loops
add “if tool fails, ask user for clarification” logic

Pitfall 3: Sending entire history forever

Fix it by:

conversation summarization after N turns
store structured state outside the model (DB)
only send the last few turns + summary

Pitfall 4: Treating the model like a database

Models hallucinate. If it must be correct:

use tools to fetch truth
require citations
validate outputs

Kimi API Pricing Guide: K2 vs K2.5 Token Rates

Kimi API pricing is usage-based, meaning you pay for what you generate. Costs mainly come from tokens (the text you send in and the text the model outputs). Your bill is typically calculated as:

Input tokens (prompt, system instructions, chat history, retrieved document chunks)
Output tokens (the model’s answer, code, or structured JSON)

Some Kimi models may also support prompt caching, where repeated/stable parts of your prompt (like your system prompt or tool definitions) can be billed at a lower “cache hit” input rate when reused. Tool features (such as built-in web search, if enabled) may have additional per-call fees on top of token usage.

Model	Input (cache hit)	Input (cache miss)	Output
Kimi K2	$0.15	$0.60	$2.50
Kimi K2.5	$0.10	$0.60	$3.00

Notes

“Cache hit” means Kimi’s automatic context caching applied to some repeated/stable prompt tokens; “cache miss” is normal input pricing.
If you use built-in web search, there’s an extra $0.005 per $web_search call (on top of tokens).

FAQ (Kimi API)

Is Kimi API OpenAI-compatible?

Kimi K2 documentation states that the Open Platform provides OpenAI/Anthropic-compatible API support, and Moonshot also provides a “migrating from OpenAI” guide listing compatible endpoints.

What is the main endpoint I call?

Chat completions is documented as:
POST https://api.moonshot.ai/v1/chat/completions

How do I list models?

The docs mention:
GET https://api.moonshot.ai/v1/models

Does Kimi support tool/function calling?

Yes-Kimi K2 documentation includes tool calling examples using a tools list and an “auto” tool choice flow.

Are there different base URLs for different regions?

Yes. Ecosystem docs mention a global base URL and a China base URL, and Moonshot FAQ guidance also references an overseas base URL option.

How are rate limits measured?

Moonshot documentation describes multiple dimensions like concurrency, RPM, TPM, and TPD.

Does the API support files?

Yes-Moonshot docs include a files API and migration docs list file endpoints as part of compatibility.

Which model should I use?

Use /v1/models to see what’s available, then pick based on job: fast/cheap for short tasks, “thinking” for deeper reasoning, multimodal for visual coding and image-aware workflows (where K2.5 is positioned strongly).

Final thoughts: the “good” way to build on Kimi API

If you want the Kimi API to feel great in production, build it like a real platform integration-not a demo script:

Config-driven base URL + model IDs
Strong key hygiene
Tool calling with guardrails
File + long-context workflows that don’t spam tokens
Rate-limit aware architecture
A test set that measures quality

Kimi API - Kimi K2.5 API

Chat + Tool Calling

K2 vs K2.5 Models

Comparison table (Kimi API vs other APIs)

Kimi API (Complete Developer Guide)

What is the Kimi API?

Why developers choose Kimi API

1) Long-context workflows that feel practical

2) Tool calling designed for agentic tasks

3) Modern coding and “visual coding” direction

How Kimi API is structured

Layer A - Transport + Auth

Layer B - Core endpoints

Layer C - Product behaviors

Base URL options (global, China, and overseas routing)

Getting an API key (safe, production-friendly)

Step 1: Create an account + open the console

Step 2: Generate the key

Step 3: Store it properly

Your first request (Chat Completions)

cURL example

Discovering models

Why you should always call /v1/models

How to pick models in a real product

OpenAI-compatible behavior

What this means in practice

Tool calling (function calling) with Kimi

The basic pattern

Example tool schema (weather tool)

Production tips for tool calling

Files: uploading documents for “chat with docs”

A reliable UX flow for files

Rate limits and reliability planning

How to design around rate limits (without pain)

Cost control

The most effective cost levers

Practical use cases

1) Customer support copilot

2) Research assistant (internal)

3) Developer helper (codebase Q&A)

4) Design-to-code pipeline

Best practices checklist

Security

Reliability

Quality

Common pitfalls

Pitfall 1: “It worked in cURL, but fails in production”

Pitfall 2: Tool calling loops that never end

Pitfall 3: Sending entire history forever

Pitfall 4: Treating the model like a database

Kimi API Pricing Guide: K2 vs K2.5 Token Rates

FAQ (Kimi API)

Is Kimi API OpenAI-compatible?

What is the main endpoint I call?

How do I list models?

Does Kimi support tool/function calling?

Are there different base URLs for different regions?

How are rate limits measured?

Does the API support files?

Which model should I use?

Final thoughts: the “good” way to build on Kimi API

Why you should always call `/v1/models`