Meet Kimi K2.5 โ Moonshot AI's most powerful open-source model, combining native multimodal intelligence with a 100-agent parallel swarm and frontier-level coding. Built for developers, creators, and teams who need AI that sees, thinks, codes, and acts.
Kimi K2.5 is an open-source, native multimodal agentic model built through continual pretraining on approximately 15 trillion mixed visual and text tokens atop Kimi-K2-Base. Released on January 27, 2026, it represents a fundamental upgrade โ not just in benchmark scores, but in what the model can perceive and do.
Unlike models that add vision as a plugin, K2.5 was designed from the ground up as a multimodal system. Its 400-million-parameter MoonViT encoder processes images and video frames directly into the transformer's embedding space, meaning the same Mixture-of-Experts layers reason over visual tokens exactly as they do text. The result is genuine visual understanding, not just image description.
K2.5 sits on the same trillion-parameter MoE backbone as K2, with 32 billion parameters active per token - a 61-layer topology beginning with a single dense standardization layer followed by 60 MoE layers for hierarchical processing. What changed is the execution layer: native vision, a 100-agent parallel swarm, and a 256K context window that holds a full codebase plus test output in one session.
Kimi K2.5 is accessible through four distinct operational modes on kimi.com and the Kimi App, each tuned for a different type of work. Choosing the right mode is the single most important factor in getting maximum value from K2.5.
To activate Instant mode via API, pass {'chat_template_kwargs': {"thinking": False}} in extra_body. Thinking mode is the default. Use temperature=0.6 for Instant and temperature=1.0 for Thinking. Both modes use top_p=0.95.
Most vision-capable LLMs bolt a vision encoder alongside the text model. K2.5 is fundamentally different. Its MoonViT encoder โ a 400-million-parameter vision transformer โ processes images and video frames and projects them directly into the model's embedding space as visual tokens. The MoE expert layers then process these visual tokens exactly as they do text tokens, enabling genuine cross-modal reasoning rather than surface-level image description.
K2.5 supports images in PNG, JPEG, WebP, and GIF formats, plus video input (experimental feature on official API). On visual benchmarks, K2.5 scores 78.5% on MMMU-Pro, 84.2% on MathVision, and 86.6% on VideoMMU โ positioning it among the leading open-source multimodal models available in early 2026.
This enables real-world workflows that go beyond text: K2.5 can reconstruct a website from a video walkthrough, generate production code from a UI screenshot, debug visual outputs by reasoning over what it actually sees in an image, analyze charts and diagrams embedded in documents, and complete vision-grounded agentic tasks requiring spatial and visual reasoning across multiple images in sequence.
K2.5 scores 76.8% on SWE-bench Verified โ the gold standard benchmark for autonomous software engineering on real-world GitHub issues. On SWE-bench Multilingual it scores 73.0%, and on LiveCodeBench v6 it reaches 85%, confirming generalization across languages and problem types beyond just popular repos.
The breakthrough is in frontend development. K2.5 can take a simple text prompt and generate complete, production-ready frontend interfaces โ dynamic layouts, scroll-triggered animations, responsive breakpoints, and interactive components โ in a single Agent mode session. Beyond text prompts, K2.5 excels at coding with vision: by reasoning over screenshots, wireframes, and video recordings, it implements UIs from mockups or reconstructs entire websites from walkthrough videos with remarkable fidelity.
For full-stack work, K2.5 maintains context across backend logic, API design, database schemas, and frontend implementation simultaneously within its 256K context window. When debugging, it traces errors across multiple files and suggests fixes that account for the entire system architecture โ not just the line where the error appears.
K2.5's Agent Swarm is technically distinct from most "multi-agent" implementations. There are no predefined sub-agents and no manually-configured workflow templates. When given a complex task, K2.5 automatically decomposes it into parallel subtasks, dynamically instantiates domain-specific sub-agents appropriate to each subtask, executes them in parallel, and synthesizes the results โ all without requiring any human orchestration setup.
The system can spawn up to 100 parallel sub-agents executing up to 1,500 tool calls simultaneously. For large-scale search and research tasks, this delivers a 4.5ร reduction in execution time versus single-agent sequential execution. On BrowseComp โ which tests agentic web research accuracy โ K2.5 improves from 60.6% in single-agent mode to 78.4% with Agent Swarm enabled. WideSearch benchmark coverage rises from 72.7% to 79%.
The swarm is entirely self-orchestrating. Write a natural-language task description and K2.5 handles the decomposition, delegation, parallel execution, and synthesis. No orchestration infrastructure, no workflow configuration, no sub-agent definitions required from the user's side โ this is the key architectural difference from other multi-agent frameworks.
Agent Swarm is in beta. Approximately 12% of tool calls can fail in agentic loops - implement monitoring and retry logic for production use. Not appropriate for latency-critical scenarios due to orchestration coordination overhead.
Everything you need to leverage visual agentic intelligence for coding, research, design, and autonomous multi-step workflows at production scale.
Pre-trained on 15T mixed visual and text tokens. MoonViT 400M encoder projects images and video directly into transformer embedding space. Visual and text tokens processed by the same MoE expert layers โ not separate pipelines bolted together post-training.
PNG ยท JPEG ยท WebP ยท GIF ยท Video~192,000 words per session โ roughly 384 pages of single-spaced text. Handles a full codebase, research corpus, or legal document stack without chunking workarounds. Directly resolves the primary complaint about the original Kimi K2 model's shorter context.
~384 pages in one sessionK2.5 shares K2 Thinking's interleaved reasoning architecture. It can reason, call a tool, observe the result, reason further, and call another tool in a seamless chain โ handling 200โ300 sequential tool calls without losing track of the overall goal context.
200โ300 sequential tool callsBoth K2.5-Base and K2.5-Instruct weights are publicly available on HuggingFace. The modified MIT license permits commercial use, fine-tuning, and redistribution. Native int4 quantization (~630GB storage) supported for efficient self-hosted inference on enterprise hardware.
Commercial use permittedK2.5's API is compatible with both OpenAI and Anthropic SDK standards. Works natively with openai, langchain, anthropic, and LlamaIndex. Access at platform.moonshot.ai โ change base_url and model string in any existing integration, no refactoring required.
Zero refactoring requiredSparse MoE activation means 32B active parameters despite 1T total parameters. API evaluation suite costs ~$0.27 versus $0.48โ$1.14 for comparable proprietary models. At $0.60/M input and $3.00/M output, K2.5 is 16โ25ร cheaper than Western frontier alternatives.
16โ25ร cheaper than alternativesK2.5 scores 47 on the Artificial Analysis Intelligence Index โ well above average among open-weight models of similar size (median: 28). Key benchmark scores across reasoning, coding, vision, and agentic categories are shown below. All scores sourced from Moonshot AI's technical blog, HuggingFace model page, ArtificialAnalysis, and independent reviews.
| Benchmark | Category | Score | Visual |
|---|---|---|---|
| AIME 2025 | Maths | 96.1% | |
| HMMT 2025 (Feb) | Maths | 95.4% | |
| MMLU-Pro | Knowledge | 87.1% | |
| VideoMMU | Video Vision | 86.6% | |
| LiveCodeBench v6 | Coding | 85% | |
| MathVision | Visual Math | 84.2% | |
| BrowseComp (Swarm) | Agentic | 78.4% | |
| MMMU-Pro | Vision | 78.5% | |
| SWE-bench Verified | Coding | 76.8% | |
| DeepSearchQA | Agentic | 77.1% | |
| SWE-bench Multilingual | Coding | 73.0% | |
| HLE โ with tools | Reasoning | 51.8% | |
| HLE โ text only | Reasoning | 31.5% |
HLE, AIME, HMMT, GPQA-Diamond evaluated with 96k token completion budget. AIME/HMMT averaged over 32 runs (avg@32). Sources: Moonshot AI technical blog, HuggingFace model page, ArtificialAnalysis, Clarifai guide, chatlyai.app review, Medium technical review.
K2.5 is accessible through four access paths depending on your technical needs and deployment context. Each suits a different type of workflow and team.
Access K2.5 instantly at kimi.com or via the Kimi App on iOS or Android. The web interface provides all four modes: Instant, Thinking, Agent, and Agent Swarm (Beta). Ideal for writers, researchers, students, and daily users who want K2.5's capabilities without infrastructure overhead. Free tier available; paid plans start at $19/mo for higher quotas and Agent mode.
K2.5 works best with Kimi Code CLI as its native agent framework. Install at kimi.com/code โ it integrates K2.5 into terminal and IDE workflows (VS Code supported), enabling autonomous coding sessions, codebase navigation, and multi-file refactoring with K2.5's full vision and reasoning capabilities active throughout extended sessions. 3ร quota included with membership.
Get your API key at platform.moonshot.ai. Use base_url="https://platform.moonshot.ai/v1" with the OpenAI or Anthropic SDK. Instant mode: pass {"thinking": False} in extra_body with temperature=0.6. Thinking mode is the default at temperature=1.0. Both modes use top_p=0.95 for optimal outputs. API billing is separate from membership.
Download K2.5-Base or K2.5-Instruct from HuggingFace under the modified MIT license. INT4 quantization requires ~630GB storage. Deploy using vLLM, SGLang, KTransformers, or TensorRT-LLM (minimum transformers version 4.57.1). Run the Kimi Vendor Verifier to confirm correct deployment before production traffic. Note: video input is experimental and only supported on the official Kimi API โ not third-party deployments.
Access K2.5 through membership plans for tool quotas and priority features in the Kimi app, or via the API for token-based billing when building products. Membership and API billing are completely separate โ subscribing to a plan does not include API token credits.
K2.5 API: $0.60 per 1M input tokens ยท $3.00 per 1M output tokens on Kimi's official API. Chain-of-thought outputs in Thinking mode are verbose โ set max_tokens explicitly and monitor usage carefully in production. Open weights on HuggingFace are free to download under the modified MIT license.
K2.5 is the strongest open-source multimodal model on coding and agentic benchmarks as of early 2026, at 16โ25ร lower API cost than proprietary frontier alternatives. Here is a direct comparison across architecture, benchmarks, agent capabilities, and pricing.
| Capability | Kimi K2.5 Moonshot AI |
GPT-5.2 OpenAI |
Claude Opus 4.5 Anthropic |
Gemini 3 Pro Google |
DeepSeek-V3 DeepSeek |
|---|---|---|---|---|---|
| Architecture | |||||
| Open weights | โ MIT | โ | โ | โ | โ |
| Context window | 256K | 128K | 200K | 1M | 128K |
| Native vision | โ MoonViT | โ | โ | โ | โ |
| Video input | โ Experimental | Limited | โ | โ | โ |
| Benchmarks | |||||
| SWE-bench Verified | 76.8% | ~74% | 80.9% | ~50% | ~49% |
| AIME 2025 | 96.1% | 100% | ~80% | ~85% | ~70% |
| MMMU-Pro (vision) | 78.5% | ~75% | ~72% | ~80% | N/A |
| BrowseComp (Swarm) | 78.4% | ~65% | ~58% | ~60% | ~60% |
| Agentic Capabilities | |||||
| Agent swarm system | 100 agents | โ | โ | โ | โ |
| Parallel tool calls | 1,500+ | Limited | Limited | Limited | Limited |
| Cost (API) | |||||
| Input $/1M tokens | $0.60 | $2.50+ | $3.00 | $1.25 | $0.27 |
| Output $/1M tokens | $3.00 | $10.00+ | $15.00 | $5.00 | $1.10 |
Scores as of early 2026 from public leaderboards and model documentation. Always verify current benchmarks and pricing on official provider pages before production decisions.
K2.5's four modes and native multimodal architecture cover a wide range of real-world workflows. Below are the highest-impact applications by category.
Generate production-ready animated interfaces โ scroll effects, dynamic layouts, responsive breakpoints โ from a text prompt or UI screenshot in a single Agent session.
Upload a wireframe, mockup, or Figma export. K2.5 generates working HTML/CSS/JS that implements the design with visual fidelity and interactive components.
Record a walkthrough of an existing site. K2.5 reasons over the video and reconstructs the interface as production code โ even for complex animated UIs.
Agent Swarm: 100 sub-agents search, synthesize, and cross-validate hundreds of sources in parallel. 4.5ร faster than sequential research workflows.
96.1% AIME 2025. Thinking mode for competition-level maths, physics, formal proofs, and scientific analysis with full chain-of-thought reasoning traces.
Reason over charts, spreadsheets, and visual data. Generate formula-driven analysis, pivot tables, and data visualizations from descriptions or uploaded images.
Screenshot your error UI or console output. K2.5 reasons over the visual context, traces the likely cause across files, and suggests targeted fixes.
Agent Swarm for batch document processing, competitive intelligence, financial analysis, and engineering tasks โ no orchestration setup or configuration required.
These practices will help you get the best results from K2.5 across all four modes and use cases, from daily use to production pipelines.
max_tokens limit appropriate to your task, and monitor token usage carefully in high-volume deployments to avoid cost surprises.An honest assessment of where K2.5 falls short is as important as understanding its strengths. These are real limitations, not disclaimers.
At 38.7 tokens/second on Kimi's API, K2.5 is notably slower than the median for comparable open-weight models (57.2 t/s). Thinking mode is especially verbose and slow. Consider Instant mode for latency-sensitive applications, but even Instant is slower than some alternatives.
38.7 t/s vs 57.2 medianDespite strong open-source performance (76.8% SWE-bench), K2.5 trails Claude Opus 4.5 (80.9%) and significantly trails Qwen3-Max (88.3%). For the most demanding autonomous software engineering tasks, closed models currently lead by a meaningful margin.
-11.5 pts vs Qwen3-MaxINT4 quantized weights require ~630GB storage and multiple enterprise-grade GPUs (A100-class minimum). Self-hosting is impractical without dedicated AI infrastructure. Consumer GPU deployments cannot achieve viable production inference throughput.
~630GB INT4 storage minimumApproximately 12% of tool calls fail in agentic loops. This is manageable for exploratory work but requires explicit retry logic, error handling, and monitoring for production pipelines. Not appropriate for latency-critical or high-reliability-requirement workflows without mitigation.
~12% tool call failure rateVideo chat is experimental and currently only supported on the official Kimi API. Third-party deployments via vLLM or SGLang do not support video input yet. This limits video-to-code and video reasoning use cases to Kimi's own infrastructure for now.
Official Kimi API onlyMoonshot AI's primary base is in China. Training data is undisclosed and potential biases unknown. The most detailed technical resources and community knowledge are in Chinese. English developer documentation exists but is thinner than for OpenAI or Anthropic equivalents.
Training data undisclosedHave another question? Contact us through our official channels.
Embrace the future of autonomous problem-solving with cutting-edge open-source agentic AI from Moonshot. Start free - no credit card required.