Kimi K2 is Moonshot AI's original trillion-parameter open-source frontier model โ a Mixture-of-Experts architecture with 32B active parameters, 128K context, and an agentic-first design built explicitly for coding, reasoning, and autonomous tool-use workflows. Released July 2025. Modified MIT License.
Kimi K2 is Moonshot AI's state-of-the-art Mixture-of-Experts language model, released in July 2025 under a Modified MIT License. With one trillion total parameters and 32 billion activated per token, it became the most capable open-source agentic model available at release โ distinguished not by scale alone, but by Moonshot's deliberate focus on autonomous, tool-using workflows over static benchmark performance.
What made K2's release historically significant was a rare combination: frontier benchmark performance across coding, reasoning, and knowledge tasks; deep native tool-use capability built through post-training on synthetic agentic scenarios; and full commercially usable open weights. Prior to K2, models at this capability level existed exclusively behind closed vendor APIs. K2 made frontier-class agentic intelligence genuinely accessible.
K2 ships in two variants: K2-Base - the raw pre-trained model for researchers and builders who want fine-tuning control; and K2-Instruct โ the post-trained model ready for conversational and agentic use out of the box. Both sets of weights are on HuggingFace in block-fp8 format.
Kimi K2's Mixture-of-Experts architecture is the engineering foundation that makes the model simultaneously powerful and practical to deploy. Rather than activating all one trillion parameters for every token โ which would make inference cost-prohibitive โ K2 uses sparse activation: only 32 billion parameters engage for any given token.
The architecture contains 384 specialized expert networks organized across 61 layers: one dense standardization layer at the base followed by 60 MoE layers for hierarchical processing. For each token, K2's routing mechanism selects exactly 8 specialist experts plus one shared expert. The shared expert ensures universal knowledge access; the 8 selected experts bring task-specific capabilities. This fine-grained routing โ 384 experts with 9 activated โ provides more specialization depth than coarser designs (e.g., 8 total experts with 2 activated).
The practical result: K2 delivers the knowledge capacity and representational depth of a trillion-parameter system at the inference cost of a ~32B dense model. API token costs are dramatically lower than the headline parameter count suggests, and self-hosted deployments need far less GPU memory than a truly dense 1T model would require. K2's MoE architecture is comparable in principle to DeepSeek-V3 but distinct in routing granularity, expert dimensionality, and the MuonClip optimizer that stabilizes training at this scale.
Training a 1T-parameter MoE model is a genuinely hard engineering challenge. MoE architectures at this scale are prone to attention logit explosion โ where values in the attention layers grow unbounded during training, causing catastrophic loss spikes and requiring expensive checkpoint rollbacks. Historically, training runs at this scale have required multiple interventions to recover from instability events.
Moonshot AI developed MuonClip specifically to solve this problem. MuonClip applies the Muon optimizer โ an improvement over standard Adam that applies the Nesterov momentum update in the spectral domain for better gradient conditioning โ at unprecedented trillion-parameter scale. The key innovation is a novel qk-clip technique: rather than using standard gradient clipping after the fact, MuonClip directly adjusts the query and key projection matrices in the attention mechanism to prevent logit explosion before it begins.
The outcome was a complete pre-training run of 1 trillion parameters on 15.5 trillion tokens with zero training instability. This is an unusual achievement โ most large MoE training runs encounter instability events requiring restart from checkpoint. Moonshot's ability to complete a clean 15.5T token run at this scale is a significant engineering validation of MuonClip's design, and it carries forward to K2.5 and K2.6 with the same stability characteristics.
Kimi K2 Thinking is the post-trained reasoning variant released in November 2025, four months after the original K2. It introduced a capability that had not been achieved reliably at production scale before: seamlessly interleaved chain-of-thought reasoning with native tool calls.
In practice, this means K2 Thinking can reason about a problem, determine that it needs external data or a code execution result, call the appropriate tool, receive the result, reason about the result, decide on a follow-up action, call another tool, and continue this reasoning-action chain for up to 200โ300 sequential steps without losing track of the overall task. Earlier reasoning models would either complete their full thinking trace before acting, or execute tool calls without integrated reasoning between them โ not both simultaneously.
K2 Thinking also introduced Quantization-Aware Training (QAT) at the post-training stage. By incorporating INT4 weight-only quantization to MoE components during training โ rather than applying it as post-hoc compression โ K2 Thinking achieves a ~2ร generation speed improvement over FP16 inference without accuracy degradation. All K2 Thinking benchmark results are measured under INT4 precision, making them directly comparable to production deployment characteristics.
reasoning_content in API response โ access full chain-of-thought trace for debuggingK2 Thinking API: platform.moonshot.ai, model: "kimi-k2-thinking". Recommended temperature=1.0. Access the reasoning trace via response.choices[0].message.reasoning_content. Tool-calling API follows OpenAI's function-calling schema exactly.
Kimi K2 was engineered specifically for agentic intelligence โ designed to perform reliably in the multi-step, tool-using, context-heavy workflows that real AI applications require, not just to score well on static benchmarks.
Post-trained extensively on synthetic agentic scenarios โ multi-step tool use, autonomous problem-solving, error recovery, and long-horizon task execution. K2 was trained to understand when and how to invoke tools in complex workflows, not just to understand tool schemas abstractly.
Built for autonomous workK2's API at platform.moonshot.ai is compatible with both the OpenAI and Anthropic SDK standards. Any integration using OpenAI's messages format can switch to K2 by changing two lines: base_url and model name. The Anthropic-compatible endpoint scales temperature by 0.6 for existing Claude integrations.
Zero refactoring to migrateBoth K2-Base and K2-Instruct model weights are available on HuggingFace under the Modified MIT License. Commercial use is permitted. The sole restriction applies only to deployments serving more than 100M monthly active users or generating more than $20M/month in revenue โ an irrelevant threshold for the vast majority of developers and companies.
Commercial use permittedK2 ships with a 128K token context โ approximately 96,000 words or 192 pages of text in a single session. Sufficient for large codebases, extensive document analysis, and multi-hour agentic sessions. The K2.5 and K2.6 successors extended this to 256K and 262K respectively for longer-horizon workflows.
~96,000 words per sessionAccess via kimi.com for zero-setup; platform.moonshot.ai API for production integration; HuggingFace weights for self-hosted deployment with vLLM, SGLang, KTransformers, or TensorRT-LLM; or Kimi Code CLI for terminal-native coding workflows. Every path uses the same underlying model.
API ยท Cloud ยท Self-hosted ยท CLIMoE sparse activation means 32B active parameters despite 1T total. Running K2's full benchmark evaluation suite costs approximately $0.27 โ versus $0.48 for GPT-5.2 and $1.14 for Claude Opus 4.5. K2 is estimated to be 5ร cheaper than leading closed-source models at comparable performance levels.
5ร cheaper than Claude Opus 4.5Released in July 2025, Kimi K2 immediately became one of the highest-performing open-source models on coding, agentic, and reasoning benchmarks. The results below reflect K2 Instruct performance at release, sourced from Moonshot AI's official technical report and independent evaluations.
| Benchmark | Category | K2 Score | Visual |
|---|---|---|---|
| SWE-bench Verified | Coding | 65.8% | |
| MMLU-Pro | Knowledge | 73.3% | |
| GPQA Diamond | Science | 71.0% | |
| MATH-500 | Maths | 90.6% | |
| HumanEval | Coding | 88.2% | |
| LiveCodeBench | Coding | 57.6% | |
| AIME 2025 | Maths | 66.2% | |
| tau2-bench (Airline) | Agentic | 80.0% |
Scores from Moonshot AI K2 technical report (July 2025). K2 Thinking results reported under INT4 precision. For comparison: K2.5 (Jan 2026) reached 76.8% SWE-bench; K2.6 (Apr 2026) reached 80.2%. See K2.6 page for current frontier numbers.
K2 (Jul 2025): SWE-bench 65.8%, 128K context, text only, no Agent Swarm. K2.5 (Jan 2026): 76.8% SWE-bench, 256K context, native vision (MoonViT), Agent Swarm 100 agents. K2.6 (Apr 2026): 80.2% SWE-bench, 262K context, Agent Swarm 300 agents / 4,000 steps. K2 remains a strong cost-efficient production model when K2.5/K2.6 capability isn't needed.
Four deployment paths cover every use case from no-setup exploration to fully self-hosted infrastructure.
Access K2 through kimi.com (web) or the Kimi App (iOS/Android). Instant mode for fast responses; Thinking mode for deep reasoning chains. No API key or infrastructure needed. Free tier available with basic quotas; paid plans unlock higher usage.
Free ยท No setupGet your key at platform.moonshot.ai. Use model="kimi-k2-instruct" or "kimi-k2-thinking" with any OpenAI SDK. Fully OpenAI and Anthropic-compatible. Token-based billing, separate from app membership.
Download K2-Base or K2-Instruct from HuggingFace (block-fp8 format). Deploy with vLLM, SGLang, KTransformers, or TensorRT-LLM. Run the Kimi Vendor Verifier before production traffic. Best for privacy-critical or regulated deployments.
Full infrastructure controlUse K2 as the engine for the open-source Kimi Code CLI terminal agent. Integrates with VS Code, Cursor, JetBrains, and Zed. Supports autonomous coding sessions, codebase navigation, and multi-file editing workflows natively.
Terminal ยท VS Code ยท CursorK2's combination of agentic capability, native tool use, open weights, and cost efficiency makes it a strong foundation for autonomous systems that require deep reasoning and reliable multi-step execution. These are the highest-impact application patterns.
K2's instruction-following and logical reasoning capabilities underpin autonomous programming agents. Debugging, refactoring, and multi-step development workflows with up to 300 sequential tool calls in K2 Thinking mode.
K2 Thinking's 300-step tool-call support makes it viable for long research tasks: multi-source web search, data synthesis, competitive analysis, and structured report generation with source validation.
Optimized for rigorous attention to detail โ contract review, patent drafting, legal document analysis, and annotation workflows requiring strict adherence to terminology and logical structure.
AlphaEngine built a FinGPT Agent on K2 Thinking supporting 300+ tool calls โ macroeconomic analysis, research report processing, supply chain breakdown, and automated financial report generation.
Generate complete applications from a description: backend logic, API design, database schema, and frontend โ held simultaneously within K2's 128K context window for consistent cross-layer implementation.
160K vocabulary with coverage of 50+ languages makes K2 effective for cross-language summarization, translation pipelines, and global content workflows. Particularly strong for CJK and English contexts.
Tencent CodeBuddy and Genspark run production coding agents on K2. The OpenAI-compatible API makes K2 a drop-in replacement in existing developer tool infrastructure without refactoring.
Kimi K2's reasoning capabilities extend to specialized scientific domains. DP Technology and XtalPi deploy K2.5 (built on K2's base) for chemical literature understanding and drug discovery workflows.
Between July 2025 and April 2026, Moonshot AI shipped five major model updates in the K2 lineage. Each release targeted a specific capability frontier while preserving the same underlying trillion-parameter MoE architecture and MuonClip-stabilized training approach. This cadence โ a major update roughly every two months โ is faster than any closed-source frontier lab over the same period.
1T MoE, 32B active, 128K context. MuonClip optimizer for zero-instability training on 15.5T tokens. K2-Base + K2-Instruct variants. Modified MIT License. First open-source model at this capability tier, setting the trillion-parameter open-source baseline for agentic AI.
An interim instruct update targeting coding performance improvements. SWE-bench accuracy improved versus the July checkpoint. Genspark deploys this specific version ("Kimi K2 (0905)") as the engine for their autonomous agent platform โ a production endorsement of the 0905 checkpoint's reliability.
Post-trained reasoning variant. Interleaves chain-of-thought with native tool calls, supporting 200โ300 sequential steps. Native INT4 quantization via QAT for 2ร speed improvement over FP16. Tencent CodeBuddy integrates K2 Thinking as its core model for developer workflows. Recommended temperature: 1.0.
Continual pretraining on 15T mixed visual + text tokens. MoonViT 400M encoder for native multimodal input (images + video). Context extended to 256K. Agent Swarm v1: 100 parallel sub-agents, 1,500 tool calls, 4.5ร speedup on research tasks. SWE-bench 76.8%. BrowseComp (Swarm): 78.4%.
Context extended to 262K. Agent Swarm v2: 300 sub-agents, 4,000 coordinated steps โ 3ร capacity increase from K2.5. Claw Groups for cross-model collaboration. Document-to-Skill conversion. SWE-bench 80.2%. BrowseComp Swarm 86.3%. DeepSearchQA F1 92.5%. Beta label removed โ GA release.
At its July 2025 release, K2's most significant differentiator was delivering frontier-quality benchmarks with open, commercially usable weights. No other model in the same performance tier offered all three elements simultaneously.
| Model | Kimi K2 Moonshot AI ยท Jul 2025 |
GPT-4.1 OpenAI |
Claude Opus 4 Anthropic |
Llama 3.1 405B Meta |
DeepSeek-V3 DeepSeek |
|---|---|---|---|---|---|
| Architecture & Access | |||||
| Open weights | โ Modified MIT | โ | โ | โ Llama license | โ |
| Total parameters | 1T (MoE) | ~200B est. | ~200B est. | 405B dense | 671B (MoE) |
| Active parameters | 32B | Dense | Dense | 405B | 37B |
| Context window | 128K | 128K | 200K | 128K | 128K |
| Key Benchmarks (contemporary comparisons) | |||||
| SWE-bench Verified | 65.8% | ~50% | ~72% | ~30% | ~49% |
| MMLU-Pro | 73.3% | ~72% | ~74% | ~73% | ~75% |
| GPQA Diamond | 71.0% | ~60% | ~73% | ~51% | ~60% |
| tau2-bench (agentic) | 80.0% | ~65% | ~70% | ~55% | ~60% |
| Cost | |||||
| API input $/1M tokens | ~$0.60 | $2.00+ | $3.00+ | ~$0.80 | $0.27 |
| Commercial use | โ Free (most) | API only | API only | License terms | โ |
| Self-hostable | โ vLLM, SGLang | โ | โ | โ | โ |
Benchmarks at K2 release (July 2025). Competitor comparisons are contemporary estimates from available evaluations. Verify current numbers on official model pages. For April 2026 frontier comparisons, see the K2.6 guide.
K2 Instruct (temperature=0.6) is tuned for fluid conversation and reliable agentic execution. K2 Thinking (temperature=1.0) activates extended reasoning โ meaningfully more accurate on complex multi-constraint problems, but slower. Match the model variant to the task. Defaulting to Thinking for everything wastes time and cost.
When using K2 Thinking, always log response.choices[0].message.reasoning_content alongside the main content. The full chain-of-thought trace reveals exactly where the model's logic diverged when outputs are incorrect โ invaluable for debugging agentic workflow failures that aren't obvious from the final response alone.
K2's greatest strength emerges in multi-step execution. Design your system to let K2 plan, execute a tool call, observe the result, and decide what to do next โ rather than cramming all instructions into one prompt. K2 supports 200โ300 sequential tool calls. Use this budget for genuinely complex, long-horizon tasks.
If migrating from Claude, use K2's Anthropic-compatible endpoint at platform.moonshot.ai. Note: the Anthropic API applies temperature scaling โ real_temperature = request_temperature ร 0.6 โ for compatibility with existing Claude integrations. Validate your specific task distribution before switching production traffic.
For self-hosted K2 deployments, run the official Kimi Vendor Verifier before routing production traffic. Quantization-related output drifts are subtle and will not surface during casual testing. The Vendor Verifier systematically checks output characteristics against Moonshot's reference implementation across representative tasks.
K2 remains a solid cost-efficient model for text-only workflows. For new projects: if you need native image/video understanding, use K2.5 (256K context, MoonViT encoder). If you need Agent Swarm or the best possible long-horizon coding reliability, use K2.6 (262K, 300 agents, GA stability). Switching between K2 variants is a one-line model string change in your API calls.
K2 does not process images or video. Native multimodal capability was the headline addition in K2.5 (January 2026) via the MoonViT 400M encoder. For any workflow involving screenshots, diagrams, visual design, chart analysis, or video understanding, use K2.5 or K2.6 instead.
Text-only modelK2's 128K context is sufficient for many workflows but half the 256K of K2.5 and slightly over half the 262K of K2.6. For very large codebases, full 200-page document sets, or sustained multi-hour agent sessions, the longer context of K2.5/K2.6 provides a material workflow advantage.
128K vs 256K in K2.5/K2.6Despite MoE's efficiency advantage (32B active parameters), the full K2 model still requires enterprise-grade GPU infrastructure โ A100-class at minimum for practical throughput. Block-fp8 format reduces storage requirements, but consumer GPU deployments cannot achieve usable inference speeds at production scale.
A100-class GPU minimumThe Modified MIT License restricts deployments serving more than 100M monthly active users or generating more than $20M/month in revenue โ those must credit "Kimi K2" visibly in the UI. For the overwhelming majority of teams, this threshold is irrelevant. Enterprise legal review is recommended for large-scale commercial deployments.
Review license for large scaleHave more questions? Visit our contact page or the official GitHub repository.
Free to try at kimi.com. Open weights on HuggingFace. API access at platform.moonshot.ai. The full K2 lineage โ from K2 to K2.6 - is available today.