MOONSHOT AI · APRIL 2026

KIMI
K2.6

Four intelligent modes. One open-source model. 1 trillion parameters, 262K context, and up to 300 parallel agents - built for coding, research, design, and autonomous workflows.

kimi-k2.6 · terminal
$ kimi --model kimi-k2.6 --mode agent
Initializing K2.6 · 32B active params · 262K ctx
Loading agent toolkit: web, code, docs, slides…
 
user: Overhaul the matching engine for +100% throughput
k2.6: Spawning 8 sub-agents: profiler, code-gen ×2,
test-runner, optimizer, benchmarker…
[Swarm] 4,000 steps · 1,024 tool calls · 13 hrs
✓ Throughput: 0.43 → 1.24 MT/s (+185%)
 
$
1TParameters
32BActive / Token
262KContext
300Max Agents
4Modes
01

Four Modes

K2.6 INSTANT

The fastest way to interact with K2.6. Instant mode is optimized for quick back-and-forth — short answers, rapid Q&A, short drafts, and real-time conversation without waiting for deep deliberation.

Fast responses Low latency Short drafts Q&A
💬
Real-time conversation

Sub-second first-token latency. Optimized for fluid, natural back-and-forth dialogue across topics.

✍️
Short drafting

Quickly generate emails, captions, replies, and short-form content without agent overhead.

🔍
Factual Q&A

Rapid answers to definitions, calculations, translations, and factual lookups.

🌐
128K+ context

Even in Instant mode, K2.6 maintains a massive context window for document-in-context tasks.

INSTANT MODE kimi-k2.6
U
What's the difference between MoE and dense transformer models?
K
In a dense transformer, every parameter activates for every token. In a Mixture-of-Experts (MoE) model, only a subset of "expert" layers activates per token — so you get a huge total parameter count with only a fraction of the compute cost per inference. K2.6 uses MoE: 1T total params, 32B active per token.
U
And MLA attention?
K
Multi-head Latent Attention compresses the key-value cache into low-rank latent vectors, dramatically cutting memory at long contexts — critical for K2.6's 262K token window.
K2.6 THINKING

Deep, chain-of-thought reasoning for complex questions. Thinking mode allocates extended compute to reason step-by-step — ideal for maths, logic, research synthesis, and multi-constraint problems.

Extended reasoning STEM problems Logic chains 55.5% HLE
🔬
GPQA Diamond — 78%

Top-tier performance on graduate-level science questions requiring multi-step expert reasoning.

📐
Math & logic

Solves competition-level maths, formal proofs, and algorithmic challenges with visible reasoning traces.

📊
HLE — 55.5% (with tools)

Humanity's Last Exam: one of the world's hardest benchmarks. K2.6 Thinking scores near the frontier with tool use.

🏗️
Architectural decisions

Reason through system design trade-offs, protocol comparisons, and multi-constraint engineering choices.

THINKING MODE kimi-k2.6
U
Why is the halting problem undecidable? Prove it.
K
〈thinking〉Assume a decider H exists…

Proof by contradiction (Turing, 1936):
Assume a machine H(P, i) decides if P halts on i. Construct D: run H(D, D) — if H says "halts", loop forever; if "doesn't halt", halt. D(D) contradicts H's answer in every case. ∴ H cannot exist.
K2.6 AGENT

Multi-step autonomous execution. Agent mode lets K2.6 plan, use tools, and produce structured outputs across research, slides, websites, documents, and spreadsheets — all from a single prompt.

Research Slides Websites Docs & Sheets
🔎
Deep Research

Autonomously searches, synthesizes, and structures research reports from live web data. Supports parallel concurrent tasks.

🎨
Websites & UI generation

Prompt → Awwwards-level interface with animations, auth, database ops. Deploys live. Full-stack from one instruction.

📊
Slides, Docs, Sheets

Creates structured presentations, Word documents, and formula-driven spreadsheets with summaries and formatted outputs.

⚙️
Tool orchestration

Chains web search, code execution, file manipulation, and API calls into complete multi-step workflows.

AGENT MODE kimi-k2.6
U
Create a 12-slide pitch deck on AI in healthcare for a Series A investor audience.
K
→ Research Querying market data, funding rounds, clinical outcomes…
→ Structure Outline: Problem · Market · Solution · Traction · Team · Ask
→ Slides Generating 12 slides with charts and speaker notes…
✓ Deck ready — 12 slides, exportable as PPTX
AGENT SWARM BETA

Massively parallel multi-agent execution. K2.6 dynamically spawns up to 300 specialized sub-agents that work in parallel — slashing execution time on large-scale tasks by up to 4.5×.

300 sub-agents 4,000+ steps Batch tasks Long writing
🔥
Massive search

Deploy 300 sub-agents to search, scrape, synthesize, and cross-validate information in parallel — hours of research in minutes.

📝
Long-form writing at scale

Parallelize research, outlining, section drafting, fact-checking, and editing into a coordinated swarm output.

⚙️
Batch task processing

Run hundreds of similar tasks (data enrichment, classification, summarization) simultaneously via coordinated sub-agents.

🏎️
4.5× faster execution

Proven: K2.6 autonomously overhauled an 8-year-old financial engine over 13 hours — +185% throughput, 4,000+ coordinated steps.

AGENT SWARM · BETA kimi-k2.6
[COORDINATOR] Task: overhaul exchange-core engine
[SPAWN] 8 sub-agents: profiler, flame-graph, code-gen ×2…
[PROFILER] Bottleneck: thread topology (4ME+2RE)
[OPTIMIZER] Strategy 1 failed (+2%). Trying strategy 2…
[CODE-GEN A] Reconfiguring topology → 2ME+1RE
[TEST] 1,024 tool calls · 0 regressions
[BENCH] 0.43 → 0.91 MT/s (+112%)
[OPTIMIZER] Strategy 9: memory pool pre-alloc
✓ Final: 1.24 MT/s (+185%) · 4,000 steps · 13 hrs
02

How to Use Kimi K2.6

01

Access kimi.com

Open kimi.com — no setup required. Free tier available. Sign up and start with Instant mode immediately. Upgrade to unlock Agent and Swarm.

02

Choose your mode

Select from Instant (fast Q&A), Thinking (deep reasoning), Agent (research + create), or Agent Swarm (parallel execution). Match mode to task complexity.

03

Use API or deploy open weights

Integrate via the OpenAI-compatible API (model: "kimi-k2.6") or download open weights from HuggingFace for self-hosted deployments.

04

Scale with membership

Moderato ($19/mo) unlocks Deep Research and Kimi Code. Allegretto ($39/mo) adds Agent Swarm. Vivace ($199/mo) gives 30× Kimi Code and 240 swarm uses per month.

03

Key Features

💻

Long-Horizon Coding

Sustains 13-hour autonomous coding sessions. Proven in production: overhauled an 8-year-old financial engine across 12 strategies and 1,000+ tool calls — +185% throughput.

🎨

Coding-Driven Design

Transforms a single prompt into Awwwards-level interfaces with scroll animations, auth, and database layers. No designer required — full-stack from one instruction.

🔎

Deep Research

Searches and synthesizes across hundreds of live web sources. Supports parallel concurrent research tasks with structured report output and professional data access.

👁

Native Multimodal

Accepts image and video input natively. Design-to-code from UI screenshots. Generates coherent visual assets, slides, and diagrams alongside text and code output.

🧮

262K Context Window

Processes entire codebases, research corpora, or legal document stacks in a single session. Auto-compression summarizes history when the window approaches capacity.

🔓

Open Source (MIT)

Weights available on HuggingFace under a modified MIT license. Self-host with vLLM, SGLang, KTransformers, or TensorRT-LLM. Commercial use permitted.

04

Use Cases

Instant Mode

Daily AI Assistant

Fast answers to questions, email drafts, quick summaries, translation, and real-time brainstorming — with K2.6's intelligence at instant speed.

🔬
Thinking Mode

Research & Academic Work

Solve complex STEM problems, synthesize conflicting literature, reason through multi-constraint research questions, and generate formal proofs or methodology critiques.

🌐
Agent Mode

Full-Stack Development

From UI screenshot to deployed website with auth, database, and animations. Kimi Code integration gives developers up to 30× their code request quota.

📊
Agent Mode

Business Intelligence

Generate financial dashboards, pivot tables, and KPI decks from raw CSVs. Access World Bank, academic, and economic datasets via Professional Data integration.

🐝
Agent Swarm

Large-Scale Content

Deploy hundreds of parallel agents for batch content creation, competitive research at scale, report generation, or automated data enrichment pipelines.

🏢
Agent Swarm

Enterprise Automation

Multi-agent workflows for legal document processing, due diligence, financial analysis, and long-horizon engineering tasks — all running autonomously over hours.

05

Pricing

Adagio
Free Tier
$0/mo
Basic access · limited quotas
Get Started Free
  • Includes
  • Kimi K2.6 Instant mode
  • 6 agent uses / mo
  • 200 Professional Data req
  • Agent Swarm
  • Kimi Code
Moderato
Advanced Flow
$19/mo
Billed monthly
Get Moderato
  • Includes
  • All K2.6 modes (Instant/Thinking/Agent)
  • 60 agent credits / mo
  • Deep Research
  • Kimi Code 1× credits
  • Websites Deploy
  • 2,000 Pro Data req
  • Agent Swarm
Vivace
Ultimate Boost
$199/mo
Billed monthly
Get Vivace
  • Everything in Allegretto, plus
  • 720 agent credits / mo
  • Kimi Code 30× credits
  • Agent Swarm Max
  • 240 swarm uses / mo
  • 8 concurrent sub-agents
  • 24,000 Pro Data req
API billing is separate from membership — pay per token: ~$0.55/1M input · ~$2.65/1M output. OpenAI-compatible. API Docs →  ·  Open weights free on HuggingFace (MIT)
06

Compare Models

Capability
Kimi K2.6
Moonshot AI
GPT-4o
OpenAI
Claude Sonnet 4
Anthropic
Gemini 1.5 Pro
Google
DeepSeek-V3
DeepSeek
Architecture
Parameters (total) 1 Trillion ~200B est.~200B est.~1T est.685B
Architecture MoE DenseDenseMoEMoE
Context window 262K tokens 128K200K1M128K
Open weights ✓ MIT
Benchmarks
SWE-bench Verified 72% ~55%~72%~50%~49%
GPQA Diamond 78% ~54%~65%~60%~60%
HLE (w/ tools) 55.5% ~37%~36%~30%~35%
Agentic Capabilities
Agent / tool use
Multi-agent swarm 300 agents
Native image input
Design-to-code Partial Partial Partial
Pricing
App subscription $19–199/mo $20/mo$20/mo$20/mo$0 (web)
API input ($/1M) ~$0.55 $2.50$3.00$1.25$0.27
API output ($/1M) ~$2.65 $10.00$15.00$5.00$1.10

Benchmark figures are best-published as of Apr 2026. Competitor estimates sourced from public leaderboards. Always verify current numbers on official provider pages.

07

Limitations

🌐

Agent Swarm is Beta

Agent Swarm (up to 300 sub-agents) is still in beta. Availability varies by region and plan tier. Complex swarm tasks may produce inconsistent coordination at very high agent counts.

Available from Allegretto ($39/mo) upward
🔤

Mainland China vs International pricing

Pricing, plan names, and feature availability differ between mainland China (RMB pricing) and international users (USD pricing). Not all features are available in all regions simultaneously.

Verify at kimi.com for your region
💾

Membership ≠ API access

App membership unlocks tool quotas but does not include API token credits. Developers building with K2.6 via API are billed separately per token regardless of membership tier.

API billing: ~$0.55/1M in · ~$2.65/1M out
🖥️

Self-hosting requires powerful hardware

Running 1T parameter open weights locally demands enterprise-grade GPU infrastructure. The 32B active MoE architecture helps, but full deployment is not practical on consumer hardware.

Use vLLM / SGLang / KTransformers on A100+ GPUs

Long tasks incur higher token costs

Keeping long conversation histories, running deep research loops, or requesting 3,000+ word outputs significantly increases API token spend. Output tokens are priced ~5× input tokens.

Summarize context, cap output length, use templates
🧭

English community docs lag Chinese docs

Moonshot AI's primary documentation and community resources are in Chinese. English developer documentation exists but may lag behind Chinese-language releases for new features.

platform.moonshot.cn/docs has English API reference
08

Advantages

🏆

Frontier benchmarks at open-source pricing

K2.6 scores 72% on SWE-bench Verified, 78% GPQA Diamond, and 55.5% HLE with tools — performance that rivals closed-source frontier models at a fraction of the API cost (~$0.55/1M input vs $2.50–$3.00 for GPT-4o/Claude).

🔓

Truly open — weights + MIT license

Unlike GPT-4o or Claude, K2.6 weights are publicly available under a modified MIT license. Self-host, fine-tune, or integrate without vendor lock-in. Deploy on-premise for full data sovereignty.

🐝

Only model with native 300-agent swarm

No other model offers a built-in, production-tested multi-agent swarm system at 300 parallel sub-agents. 4,000+ coordinated execution steps. 4.5× faster than single-agent approaches on complex tasks.

📐

MoE efficiency — 1T params, 32B active

Mixture-of-Experts architecture activates only 32B of 1T parameters per token. You get trillion-parameter model quality with the compute cost of a 32B model — exceptional inference efficiency at scale.

🎯

Purpose-built for agentic workflows

K2.6 isn't retrofitted for agents — it was designed for them from training. 262K context, native tool use, auto-context compression, and the Kimi tool ecosystem (Docs, Slides, Sheets, Websites, Claw) are first-class capabilities.

💰

Competitive API pricing for builders

At ~$0.55/1M input and ~$2.65/1M output, K2.6 via API is significantly cheaper than GPT-4o ($2.50/$10) and Claude Sonnet ($3/$15) for equivalent-quality output on coding and reasoning tasks.

🌐

All-in-one workspace, not just a chatbot

One K2.6 subscription covers deep research, slide creation, website deployment, spreadsheet generation, code authoring, and browser-based agent automation — replacing multiple single-purpose SaaS tools.

🔬

Cutting-edge architecture research

K2.6 incorporates MuonClip optimizer for training stability, Multi-head Latent Attention (MLA) for KV cache efficiency, and automatic context compression — proprietary innovations from Moonshot AI's research team.

THINK BIGGER
WITH K2.6

Four modes. One model. Start free — scale to enterprise. No commitment required.