Four intelligent modes. One open-source model. 1 trillion parameters, 262K context, and up to 300 parallel agents - built for coding, research, design, and autonomous workflows.
The fastest way to interact with K2.6. Instant mode is optimized for quick back-and-forth — short answers, rapid Q&A, short drafts, and real-time conversation without waiting for deep deliberation.
Sub-second first-token latency. Optimized for fluid, natural back-and-forth dialogue across topics.
Quickly generate emails, captions, replies, and short-form content without agent overhead.
Rapid answers to definitions, calculations, translations, and factual lookups.
Even in Instant mode, K2.6 maintains a massive context window for document-in-context tasks.
Deep, chain-of-thought reasoning for complex questions. Thinking mode allocates extended compute to reason step-by-step — ideal for maths, logic, research synthesis, and multi-constraint problems.
Top-tier performance on graduate-level science questions requiring multi-step expert reasoning.
Solves competition-level maths, formal proofs, and algorithmic challenges with visible reasoning traces.
Humanity's Last Exam: one of the world's hardest benchmarks. K2.6 Thinking scores near the frontier with tool use.
Reason through system design trade-offs, protocol comparisons, and multi-constraint engineering choices.
Multi-step autonomous execution. Agent mode lets K2.6 plan, use tools, and produce structured outputs across research, slides, websites, documents, and spreadsheets — all from a single prompt.
Autonomously searches, synthesizes, and structures research reports from live web data. Supports parallel concurrent tasks.
Prompt → Awwwards-level interface with animations, auth, database ops. Deploys live. Full-stack from one instruction.
Creates structured presentations, Word documents, and formula-driven spreadsheets with summaries and formatted outputs.
Chains web search, code execution, file manipulation, and API calls into complete multi-step workflows.
Massively parallel multi-agent execution. K2.6 dynamically spawns up to 300 specialized sub-agents that work in parallel — slashing execution time on large-scale tasks by up to 4.5×.
Deploy 300 sub-agents to search, scrape, synthesize, and cross-validate information in parallel — hours of research in minutes.
Parallelize research, outlining, section drafting, fact-checking, and editing into a coordinated swarm output.
Run hundreds of similar tasks (data enrichment, classification, summarization) simultaneously via coordinated sub-agents.
Proven: K2.6 autonomously overhauled an 8-year-old financial engine over 13 hours — +185% throughput, 4,000+ coordinated steps.
Open kimi.com — no setup required. Free tier available. Sign up and start with Instant mode immediately. Upgrade to unlock Agent and Swarm.
Select from Instant (fast Q&A), Thinking (deep reasoning), Agent (research + create), or Agent Swarm (parallel execution). Match mode to task complexity.
Integrate via the OpenAI-compatible API (model: "kimi-k2.6") or download open weights from HuggingFace for self-hosted deployments.
Moderato ($19/mo) unlocks Deep Research and Kimi Code. Allegretto ($39/mo) adds Agent Swarm. Vivace ($199/mo) gives 30× Kimi Code and 240 swarm uses per month.
Sustains 13-hour autonomous coding sessions. Proven in production: overhauled an 8-year-old financial engine across 12 strategies and 1,000+ tool calls — +185% throughput.
Transforms a single prompt into Awwwards-level interfaces with scroll animations, auth, and database layers. No designer required — full-stack from one instruction.
Searches and synthesizes across hundreds of live web sources. Supports parallel concurrent research tasks with structured report output and professional data access.
Accepts image and video input natively. Design-to-code from UI screenshots. Generates coherent visual assets, slides, and diagrams alongside text and code output.
Processes entire codebases, research corpora, or legal document stacks in a single session. Auto-compression summarizes history when the window approaches capacity.
Weights available on HuggingFace under a modified MIT license. Self-host with vLLM, SGLang, KTransformers, or TensorRT-LLM. Commercial use permitted.
Fast answers to questions, email drafts, quick summaries, translation, and real-time brainstorming — with K2.6's intelligence at instant speed.
Solve complex STEM problems, synthesize conflicting literature, reason through multi-constraint research questions, and generate formal proofs or methodology critiques.
From UI screenshot to deployed website with auth, database, and animations. Kimi Code integration gives developers up to 30× their code request quota.
Generate financial dashboards, pivot tables, and KPI decks from raw CSVs. Access World Bank, academic, and economic datasets via Professional Data integration.
Deploy hundreds of parallel agents for batch content creation, competitive research at scale, report generation, or automated data enrichment pipelines.
Multi-agent workflows for legal document processing, due diligence, financial analysis, and long-horizon engineering tasks — all running autonomously over hours.
| Capability | Kimi K2.6 Moonshot AI |
GPT-4o OpenAI |
Claude Sonnet 4 Anthropic |
Gemini 1.5 Pro Google |
DeepSeek-V3 DeepSeek |
|---|---|---|---|---|---|
| Architecture | |||||
| Parameters (total) | 1 Trillion | ~200B est. | ~200B est. | ~1T est. | 685B |
| Architecture | MoE | Dense | Dense | MoE | MoE |
| Context window | 262K tokens | 128K | 200K | 1M | 128K |
| Open weights | ✓ MIT | ✗ | ✗ | ✗ | ✓ |
| Benchmarks | |||||
| SWE-bench Verified | 72% | ~55% | ~72% | ~50% | ~49% |
| GPQA Diamond | 78% | ~54% | ~65% | ~60% | ~60% |
| HLE (w/ tools) | 55.5% | ~37% | ~36% | ~30% | ~35% |
| Agentic Capabilities | |||||
| Agent / tool use | ✓ | ✓ | ✓ | ✓ | ✓ |
| Multi-agent swarm | 300 agents | ✗ | ✗ | ✗ | ✗ |
| Native image input | ✓ | ✓ | ✓ | ✓ | ✗ |
| Design-to-code | ✓ | Partial | Partial | Partial | ✗ |
| Pricing | |||||
| App subscription | $19–199/mo | $20/mo | $20/mo | $20/mo | $0 (web) |
| API input ($/1M) | ~$0.55 | $2.50 | $3.00 | $1.25 | $0.27 |
| API output ($/1M) | ~$2.65 | $10.00 | $15.00 | $5.00 | $1.10 |
Benchmark figures are best-published as of Apr 2026. Competitor estimates sourced from public leaderboards. Always verify current numbers on official provider pages.
Agent Swarm (up to 300 sub-agents) is still in beta. Availability varies by region and plan tier. Complex swarm tasks may produce inconsistent coordination at very high agent counts.
Available from Allegretto ($39/mo) upwardPricing, plan names, and feature availability differ between mainland China (RMB pricing) and international users (USD pricing). Not all features are available in all regions simultaneously.
Verify at kimi.com for your regionApp membership unlocks tool quotas but does not include API token credits. Developers building with K2.6 via API are billed separately per token regardless of membership tier.
API billing: ~$0.55/1M in · ~$2.65/1M outRunning 1T parameter open weights locally demands enterprise-grade GPU infrastructure. The 32B active MoE architecture helps, but full deployment is not practical on consumer hardware.
Use vLLM / SGLang / KTransformers on A100+ GPUsKeeping long conversation histories, running deep research loops, or requesting 3,000+ word outputs significantly increases API token spend. Output tokens are priced ~5× input tokens.
Summarize context, cap output length, use templatesMoonshot AI's primary documentation and community resources are in Chinese. English developer documentation exists but may lag behind Chinese-language releases for new features.
platform.moonshot.cn/docs has English API referenceK2.6 scores 72% on SWE-bench Verified, 78% GPQA Diamond, and 55.5% HLE with tools — performance that rivals closed-source frontier models at a fraction of the API cost (~$0.55/1M input vs $2.50–$3.00 for GPT-4o/Claude).
Unlike GPT-4o or Claude, K2.6 weights are publicly available under a modified MIT license. Self-host, fine-tune, or integrate without vendor lock-in. Deploy on-premise for full data sovereignty.
No other model offers a built-in, production-tested multi-agent swarm system at 300 parallel sub-agents. 4,000+ coordinated execution steps. 4.5× faster than single-agent approaches on complex tasks.
Mixture-of-Experts architecture activates only 32B of 1T parameters per token. You get trillion-parameter model quality with the compute cost of a 32B model — exceptional inference efficiency at scale.
K2.6 isn't retrofitted for agents — it was designed for them from training. 262K context, native tool use, auto-context compression, and the Kimi tool ecosystem (Docs, Slides, Sheets, Websites, Claw) are first-class capabilities.
At ~$0.55/1M input and ~$2.65/1M output, K2.6 via API is significantly cheaper than GPT-4o ($2.50/$10) and Claude Sonnet ($3/$15) for equivalent-quality output on coding and reasoning tasks.
One K2.6 subscription covers deep research, slide creation, website deployment, spreadsheet generation, code authoring, and browser-based agent automation — replacing multiple single-purpose SaaS tools.
K2.6 incorporates MuonClip optimizer for training stability, Multi-head Latent Attention (MLA) for KV cache efficiency, and automatic context compression — proprietary innovations from Moonshot AI's research team.
Four modes. One model. Start free — scale to enterprise. No commitment required.