NEW๐Ÿš€ Kimi K2.6 Now Available - The next evolution with 300-Agent Swarm & 262K context
OPEN SOURCE ยท VISUAL AGENTIC INTELLIGENCE ยท JAN 2026

Hello, KimiK2.5

Meet Kimi K2.5 โ€” Moonshot AI's most powerful open-source model, combining native multimodal intelligence with a 100-agent parallel swarm and frontier-level coding. Built for developers, creators, and teams who need AI that sees, thinks, codes, and acts.

1TParameters
32BActive / Token
256KContext
100Max Sub-Agents
15TTraining Tokens
01 - WHAT IS KIMI K2.5

The Most Powerful Open-Source Model from Moonshot AI

Kimi K2.5 is an open-source, native multimodal agentic model built through continual pretraining on approximately 15 trillion mixed visual and text tokens atop Kimi-K2-Base. Released on January 27, 2026, it represents a fundamental upgrade โ€” not just in benchmark scores, but in what the model can perceive and do.

Unlike models that add vision as a plugin, K2.5 was designed from the ground up as a multimodal system. Its 400-million-parameter MoonViT encoder processes images and video frames directly into the transformer's embedding space, meaning the same Mixture-of-Experts layers reason over visual tokens exactly as they do text. The result is genuine visual understanding, not just image description.

K2.5 sits on the same trillion-parameter MoE backbone as K2, with 32 billion parameters active per token - a 61-layer topology beginning with a single dense standardization layer followed by 60 MoE layers for hierarchical processing. What changed is the execution layer: native vision, a 100-agent parallel swarm, and a 256K context window that holds a full codebase plus test output in one session.

  • Native multimodal - trained on 15T visual + text tokens from the ground up
  • MoonViT 400M encoder - vision pipeline that's architecturally integrated, not bolt-on
  • 100-agent swarm - parallel sub-agents, 1,500+ tool calls, 4.5ร— speedup
  • 256K context window - ~192,000 words or 384 pages in one session
  • 76.8% SWE-bench Verified - top-tier open-source coding performance
  • Open weights MIT license - modify, redistribute, use commercially
KIMI K2.5 ยท JAN 2026
Visual
Agentic
Intelligence
Native multimodal + 100-agent swarm. The most capable open-source model for coding, vision, and agentic workflows as of early 2026.
76.8%SWE-bench
96.1%AIME 2025
4.5ร—Swarm speed
02 โ€” FOUR INTELLIGENT MODES

K2.5 Instant ยท Thinking ยท Agent ยท Agent Swarm

Kimi K2.5 is accessible through four distinct operational modes on kimi.com and the Kimi App, each tuned for a different type of work. Choosing the right mode is the single most important factor in getting maximum value from K2.5.

โšก
K2.5 Instant
Quick Response
Low-latency conversation for speed. Fast answers, short drafts, Q&A, translations, and document summaries. Temperature 0.6, top_p 0.95. The default for everyday interactions where you want results in seconds, not minutes.
๐Ÿง 
K2.5 Thinking
Deep Reasoning
Extended chain-of-thought with a 96k token completion budget. Temperature 1.0 for best results. Best for graduate-level maths, science, complex logic, system design, and multi-constraint research where accuracy matters more than speed.
๐Ÿค–
K2.5 Agent
Autonomous Tasks
Multi-step tool use โ€” web search, code execution, file manipulation, and API calls. Handles 200โ€“300 sequential tool calls without losing context. Automates research workflows, data extraction, report generation, and full-stack coding tasks end-to-end.
๐Ÿ
K2.5 Agent Swarm
Beta ยท Parallel
Decomposes tasks into parallel subtasks executed by up to 100 specialized sub-agents running 1,500+ tool calls simultaneously. 4.5ร— faster than single-agent. BrowseComp accuracy rises from 60.6% to 78.4% with Swarm active.
API mode selection

To activate Instant mode via API, pass {'chat_template_kwargs': {"thinking": False}} in extra_body. Thinking mode is the default. Use temperature=0.6 for Instant and temperature=1.0 for Thinking. Both modes use top_p=0.95.

03 - NATIVE MULTIMODAL

Visual AI That Actually Understands Images

Most vision-capable LLMs bolt a vision encoder alongside the text model. K2.5 is fundamentally different. Its MoonViT encoder โ€” a 400-million-parameter vision transformer โ€” processes images and video frames and projects them directly into the model's embedding space as visual tokens. The MoE expert layers then process these visual tokens exactly as they do text tokens, enabling genuine cross-modal reasoning rather than surface-level image description.

K2.5 supports images in PNG, JPEG, WebP, and GIF formats, plus video input (experimental feature on official API). On visual benchmarks, K2.5 scores 78.5% on MMMU-Pro, 84.2% on MathVision, and 86.6% on VideoMMU โ€” positioning it among the leading open-source multimodal models available in early 2026.

This enables real-world workflows that go beyond text: K2.5 can reconstruct a website from a video walkthrough, generate production code from a UI screenshot, debug visual outputs by reasoning over what it actually sees in an image, analyze charts and diagrams embedded in documents, and complete vision-grounded agentic tasks requiring spatial and visual reasoning across multiple images in sequence.

  • 78.5% MMMU-Pro โ€” multimodal understanding benchmark
  • 84.2% MathVision โ€” visual mathematics reasoning
  • 86.6% VideoMMU โ€” video understanding and reasoning
  • Design-to-code โ€” upload a UI mockup, receive production HTML/CSS/JS
  • Video-to-website โ€” reconstruct a full interface from a screen recording
  • Visual debugging โ€” reason over screenshots to diagnose root causes
๐Ÿ‘ NATIVE MULTIMODAL
See to
Understand.
Build.
MoonViT 400M encoder. Images and video projected directly into transformer embedding space โ€” the same MoE layers reason over visual and text tokens identically.
78.5%MMMU-Pro
84.2%MathVision
86.6%VideoMMU
04 โ€” CODING WITH VISION

The Strongest Open-Source Coding Model

K2.5 scores 76.8% on SWE-bench Verified โ€” the gold standard benchmark for autonomous software engineering on real-world GitHub issues. On SWE-bench Multilingual it scores 73.0%, and on LiveCodeBench v6 it reaches 85%, confirming generalization across languages and problem types beyond just popular repos.

The breakthrough is in frontend development. K2.5 can take a simple text prompt and generate complete, production-ready frontend interfaces โ€” dynamic layouts, scroll-triggered animations, responsive breakpoints, and interactive components โ€” in a single Agent mode session. Beyond text prompts, K2.5 excels at coding with vision: by reasoning over screenshots, wireframes, and video recordings, it implements UIs from mockups or reconstructs entire websites from walkthrough videos with remarkable fidelity.

For full-stack work, K2.5 maintains context across backend logic, API design, database schemas, and frontend implementation simultaneously within its 256K context window. When debugging, it traces errors across multiple files and suggests fixes that account for the entire system architecture โ€” not just the line where the error appears.

  • 76.8% SWE-bench Verified โ€” top open-source software engineering benchmark
  • 85% LiveCodeBench v6 โ€” real-world programming task evaluation
  • Full-stack from one prompt โ€” backend + API + schema + frontend in one session
  • Scroll animations and interactive UI โ€” generated from a description
  • Video-to-website โ€” reconstructs a site from a screen recording
  • Cross-file debugging โ€” traces errors across the entire codebase structure
๐Ÿ’ป CODING WITH VISION
Prompt.
See.
Ship.
Frontend generation, full-stack scaffolding, and visual debugging โ€” K2.5 is the strongest open-source coding model for real-world tasks as of early 2026.
76.8%SWE-bench
85%LiveCode v6
73.0%SWE Multilang
05 โ€” K2.5 AGENT SWARM

100 Sub-Agents. 1,500+ Tool Calls. 4.5ร— Faster.

K2.5's Agent Swarm is technically distinct from most "multi-agent" implementations. There are no predefined sub-agents and no manually-configured workflow templates. When given a complex task, K2.5 automatically decomposes it into parallel subtasks, dynamically instantiates domain-specific sub-agents appropriate to each subtask, executes them in parallel, and synthesizes the results โ€” all without requiring any human orchestration setup.

The system can spawn up to 100 parallel sub-agents executing up to 1,500 tool calls simultaneously. For large-scale search and research tasks, this delivers a 4.5ร— reduction in execution time versus single-agent sequential execution. On BrowseComp โ€” which tests agentic web research accuracy โ€” K2.5 improves from 60.6% in single-agent mode to 78.4% with Agent Swarm enabled. WideSearch benchmark coverage rises from 72.7% to 79%.

The swarm is entirely self-orchestrating. Write a natural-language task description and K2.5 handles the decomposition, delegation, parallel execution, and synthesis. No orchestration infrastructure, no workflow configuration, no sub-agent definitions required from the user's side โ€” this is the key architectural difference from other multi-agent frameworks.

  • Up to 100 parallel sub-agents โ€” dynamically created per task, no presets
  • 1,500+ simultaneous tool calls โ€” web search, code, APIs running in parallel
  • 4.5ร— faster execution โ€” on complex research and search tasks vs single-agent
  • BrowseComp: 60.6% โ†’ 78.4% โ€” measurable accuracy uplift from parallelism
  • WideSearch: 72.7% โ†’ 79% โ€” broader source coverage on research tasks
  • Zero workflow configuration โ€” K2.5 orchestrates everything automatically
Beta feature note

Agent Swarm is in beta. Approximately 12% of tool calls can fail in agentic loops - implement monitoring and retry logic for production use. Not appropriate for latency-critical scenarios due to orchestration coordination overhead.

๐Ÿ AGENT SWARM ยท BETA
100
Agents.
Zero Config.
No predefined workflows. K2.5 automatically decomposes, delegates, and executes across 100 parallel sub-agents โ€” then synthesizes into a single coherent output.
100Sub-agents
1,500+Tool calls
4.5ร—Faster
06 - KEY FEATURES

Key Features of Kimi K2.5

Everything you need to leverage visual agentic intelligence for coding, research, design, and autonomous multi-step workflows at production scale.

// 01

Native Multimodal Architecture

Pre-trained on 15T mixed visual and text tokens. MoonViT 400M encoder projects images and video directly into transformer embedding space. Visual and text tokens processed by the same MoE expert layers โ€” not separate pipelines bolted together post-training.

PNG ยท JPEG ยท WebP ยท GIF ยท Video
// 02

256K Context Window

~192,000 words per session โ€” roughly 384 pages of single-spaced text. Handles a full codebase, research corpus, or legal document stack without chunking workarounds. Directly resolves the primary complaint about the original Kimi K2 model's shorter context.

~384 pages in one session
// 03

Interleaved Thinking + Tool Use

K2.5 shares K2 Thinking's interleaved reasoning architecture. It can reason, call a tool, observe the result, reason further, and call another tool in a seamless chain โ€” handling 200โ€“300 sequential tool calls without losing track of the overall goal context.

200โ€“300 sequential tool calls
// 04

Open Weights โ€” Modified MIT

Both K2.5-Base and K2.5-Instruct weights are publicly available on HuggingFace. The modified MIT license permits commercial use, fine-tuning, and redistribution. Native int4 quantization (~630GB storage) supported for efficient self-hosted inference on enterprise hardware.

Commercial use permitted
// 05

OpenAI & Anthropic-Compatible API

K2.5's API is compatible with both OpenAI and Anthropic SDK standards. Works natively with openai, langchain, anthropic, and LlamaIndex. Access at platform.moonshot.ai โ€” change base_url and model string in any existing integration, no refactoring required.

Zero refactoring required
// 06

Cost-Efficient at Frontier Scale

Sparse MoE activation means 32B active parameters despite 1T total parameters. API evaluation suite costs ~$0.27 versus $0.48โ€“$1.14 for comparable proprietary models. At $0.60/M input and $3.00/M output, K2.5 is 16โ€“25ร— cheaper than Western frontier alternatives.

16โ€“25ร— cheaper than alternatives
07 - BENCHMARKS

Kimi K2.5 Benchmark Results

K2.5 scores 47 on the Artificial Analysis Intelligence Index โ€” well above average among open-weight models of similar size (median: 28). Key benchmark scores across reasoning, coding, vision, and agentic categories are shown below. All scores sourced from Moonshot AI's technical blog, HuggingFace model page, ArtificialAnalysis, and independent reviews.

BenchmarkCategoryScoreVisual
AIME 2025Maths96.1%
HMMT 2025 (Feb)Maths95.4%
MMLU-ProKnowledge87.1%
VideoMMUVideo Vision86.6%
LiveCodeBench v6Coding85%
MathVisionVisual Math84.2%
BrowseComp (Swarm)Agentic78.4%
MMMU-ProVision78.5%
SWE-bench VerifiedCoding76.8%
DeepSearchQAAgentic77.1%
SWE-bench MultilingualCoding73.0%
HLE โ€” with toolsReasoning51.8%
HLE โ€” text onlyReasoning31.5%

HLE, AIME, HMMT, GPQA-Diamond evaluated with 96k token completion budget. AIME/HMMT averaged over 32 runs (avg@32). Sources: Moonshot AI technical blog, HuggingFace model page, ArtificialAnalysis, Clarifai guide, chatlyai.app review, Medium technical review.

08 โ€” HOW TO USE KIMI K2.5

How to Use Kimi K2.5

K2.5 is accessible through four access paths depending on your technical needs and deployment context. Each suits a different type of workflow and team.

1

Try on Kimi.com โ€” No Setup Required

Access K2.5 instantly at kimi.com or via the Kimi App on iOS or Android. The web interface provides all four modes: Instant, Thinking, Agent, and Agent Swarm (Beta). Ideal for writers, researchers, students, and daily users who want K2.5's capabilities without infrastructure overhead. Free tier available; paid plans start at $19/mo for higher quotas and Agent mode.

2

Use with Kimi Code CLI

K2.5 works best with Kimi Code CLI as its native agent framework. Install at kimi.com/code โ€” it integrates K2.5 into terminal and IDE workflows (VS Code supported), enabling autonomous coding sessions, codebase navigation, and multi-file refactoring with K2.5's full vision and reasoning capabilities active throughout extended sessions. 3ร— quota included with membership.

3

Developer API โ€” OpenAI & Anthropic Compatible

Get your API key at platform.moonshot.ai. Use base_url="https://platform.moonshot.ai/v1" with the OpenAI or Anthropic SDK. Instant mode: pass {"thinking": False} in extra_body with temperature=0.6. Thinking mode is the default at temperature=1.0. Both modes use top_p=0.95 for optimal outputs. API billing is separate from membership.

4

Self-Host Open Weights on Your Infrastructure

Download K2.5-Base or K2.5-Instruct from HuggingFace under the modified MIT license. INT4 quantization requires ~630GB storage. Deploy using vLLM, SGLang, KTransformers, or TensorRT-LLM (minimum transformers version 4.57.1). Run the Kimi Vendor Verifier to confirm correct deployment before production traffic. Note: video input is experimental and only supported on the official Kimi API โ€” not third-party deployments.

Python ยท OpenAI SDK ยท Instant + Thinking Modes
# pip install openai from openai import OpenAI client = OpenAI( api_key="YOUR_MOONSHOT_API_KEY", base_url="https://platform.moonshot.ai/v1" ) # Instant mode โ€” fast, low-latency responses instant = client.chat.completions.create( model="kimi-k2.5", temperature=0.6, top_p=0.95, extra_body={"chat_template_kwargs": {"thinking": False}}, messages=[{"role": "user", "content": "Summarize this codebase..."}] ) # Thinking mode โ€” deep reasoning (default) thinking = client.chat.completions.create( model="kimi-k2.5", temperature=1.0, top_p=0.95, messages=[{"role": "user", "content": "Solve this algorithm..."}] ) print(instant.choices[0].message.content)
09 โ€” PRICING

Kimi AI Pricing

Access K2.5 through membership plans for tool quotas and priority features in the Kimi app, or via the API for token-based billing when building products. Membership and API billing are completely separate โ€” subscribing to a plan does not include API token credits.

Adagio
Free Tier
$0/mo
Always free
Get Started Free
  • โœ“K2.5 Instant mode
  • โœ“6 agent uses / mo
  • โœ“200 Pro Data requests
  • โ€”Agent Swarm
  • โ€”Kimi Code credits
Moderato
Advanced Flow
$19/mo
Billed monthly
Get Moderato
  • All K2.5 modes
  • โœ“60 agent credits / mo
  • โœ“Deep Research
  • โœ“Kimi Code 1ร— credits
  • โœ“Websites Deploy
  • โ€”Agent Swarm
Allegretto
Pro Choice
$39/mo
Billed monthly
Get Allegretto
  • Moderato + Agent Swarm
  • โœ“150 agent credits / mo
  • โœ“Kimi Code 5ร— credits
  • โœ“Kimi Claw cloud + Android
  • โ˜…Agent Swarm 50 uses ยท 4 agents
  • โœ“5,000 Pro Data requests
Vivace
Ultimate Boost
$199/mo
Billed monthly
Get Vivace
  • Allegretto + Max Swarm
  • โœ“720 agent credits / mo
  • โœ“Kimi Code 30ร— credits
  • โ˜…Swarm 240 uses ยท 8 agents
  • โœ“24,000 Pro Data requests
API pricing โ€” always separate from membership

K2.5 API: $0.60 per 1M input tokens ยท $3.00 per 1M output tokens on Kimi's official API. Chain-of-thought outputs in Thinking mode are verbose โ€” set max_tokens explicitly and monitor usage carefully in production. Open weights on HuggingFace are free to download under the modified MIT license.

10 โ€” COMPARE

Kimi K2.5 vs Other AI Tools and Models

K2.5 is the strongest open-source multimodal model on coding and agentic benchmarks as of early 2026, at 16โ€“25ร— lower API cost than proprietary frontier alternatives. Here is a direct comparison across architecture, benchmarks, agent capabilities, and pricing.

Capability
Kimi K2.5
Moonshot AI
GPT-5.2
OpenAI
Claude Opus 4.5
Anthropic
Gemini 3 Pro
Google
DeepSeek-V3
DeepSeek
Architecture
Open weightsโœ“ MITโœ—โœ—โœ—โœ“
Context window256K128K200K1M128K
Native visionโœ“ MoonViTโœ“โœ“โœ“โœ—
Video inputโœ“ ExperimentalLimitedโœ—โœ“โœ—
Benchmarks
SWE-bench Verified76.8%~74%80.9%~50%~49%
AIME 202596.1%100%~80%~85%~70%
MMMU-Pro (vision)78.5%~75%~72%~80%N/A
BrowseComp (Swarm)78.4%~65%~58%~60%~60%
Agentic Capabilities
Agent swarm system100 agentsโ€”โ€”โ€”โ€”
Parallel tool calls1,500+LimitedLimitedLimitedLimited
Cost (API)
Input $/1M tokens$0.60$2.50+$3.00$1.25$0.27
Output $/1M tokens$3.00$10.00+$15.00$5.00$1.10

Scores as of early 2026 from public leaderboards and model documentation. Always verify current benchmarks and pricing on official provider pages before production decisions.

11 โ€” USE CASES

Where Can You Use Kimi K2.5

K2.5's four modes and native multimodal architecture cover a wide range of real-world workflows. Below are the highest-impact applications by category.

๐Ÿ’ป Frontend Development

Generate production-ready animated interfaces โ€” scroll effects, dynamic layouts, responsive breakpoints โ€” from a text prompt or UI screenshot in a single Agent session.

๐ŸŽจ Design-to-Code

Upload a wireframe, mockup, or Figma export. K2.5 generates working HTML/CSS/JS that implements the design with visual fidelity and interactive components.

๐Ÿ“น Video-to-Website

Record a walkthrough of an existing site. K2.5 reasons over the video and reconstructs the interface as production code โ€” even for complex animated UIs.

๐Ÿ”Ž Large-Scale Research

Agent Swarm: 100 sub-agents search, synthesize, and cross-validate hundreds of sources in parallel. 4.5ร— faster than sequential research workflows.

๐Ÿงฎ Math & Science

96.1% AIME 2025. Thinking mode for competition-level maths, physics, formal proofs, and scientific analysis with full chain-of-thought reasoning traces.

๐Ÿ“Š Data Analysis

Reason over charts, spreadsheets, and visual data. Generate formula-driven analysis, pivot tables, and data visualizations from descriptions or uploaded images.

๐Ÿ› Visual Debugging

Screenshot your error UI or console output. K2.5 reasons over the visual context, traces the likely cause across files, and suggests targeted fixes.

๐Ÿข Enterprise Automation

Agent Swarm for batch document processing, competitive intelligence, financial analysis, and engineering tasks โ€” no orchestration setup or configuration required.

12 - TIPS & BEST PRACTICES

Getting the Most from Kimi K2.5

These practices will help you get the best results from K2.5 across all four modes and use cases, from daily use to production pipelines.

13 โ€” LIMITATIONS

Limitations of Kimi K2.5

An honest assessment of where K2.5 falls short is as important as understanding its strengths. These are real limitations, not disclaimers.

// 01

Inference Speed

At 38.7 tokens/second on Kimi's API, K2.5 is notably slower than the median for comparable open-weight models (57.2 t/s). Thinking mode is especially verbose and slow. Consider Instant mode for latency-sensitive applications, but even Instant is slower than some alternatives.

38.7 t/s vs 57.2 median
// 02

Proprietary Coding Gap

Despite strong open-source performance (76.8% SWE-bench), K2.5 trails Claude Opus 4.5 (80.9%) and significantly trails Qwen3-Max (88.3%). For the most demanding autonomous software engineering tasks, closed models currently lead by a meaningful margin.

-11.5 pts vs Qwen3-Max
// 03

Heavy Self-Hosting Requirements

INT4 quantized weights require ~630GB storage and multiple enterprise-grade GPUs (A100-class minimum). Self-hosting is impractical without dedicated AI infrastructure. Consumer GPU deployments cannot achieve viable production inference throughput.

~630GB INT4 storage minimum
// 04

Agentic Tool Call Failure Rate

Approximately 12% of tool calls fail in agentic loops. This is manageable for exploratory work but requires explicit retry logic, error handling, and monitoring for production pipelines. Not appropriate for latency-critical or high-reliability-requirement workflows without mitigation.

~12% tool call failure rate
// 05

Video Input is Experimental

Video chat is experimental and currently only supported on the official Kimi API. Third-party deployments via vLLM or SGLang do not support video input yet. This limits video-to-code and video reasoning use cases to Kimi's own infrastructure for now.

Official Kimi API only
// 06

English Documentation Lags

Moonshot AI's primary base is in China. Training data is undisclosed and potential biases unknown. The most detailed technical resources and community knowledge are in Chinese. English developer documentation exists but is thinner than for OpenAI or Anthropic equivalents.

Training data undisclosed
14 โ€” FAQ

Frequently Asked Questions About Kimi K2.5

Have another question? Contact us through our official channels.

Kimi K2.5 & K2.6 Are Here to Redefine Intelligence

Embrace the future of autonomous problem-solving with cutting-edge open-source agentic AI from Moonshot. Start free - no credit card required.