Kimi K2.7 Code - Open-Source 1T Agentic Coding Model

Four Signature Capabilities

K2.7 Code is a single coding-first model with forced thinking - there is no "instant" mode. Its strengths cluster into four pillars.

LONG-HORIZON CODING

K2.7 Code plans, edits, runs tools, and debugs across many steps to finish whole engineering tasks - not single answers. Over K2.6 it posts +21.8% on Kimi Code Bench v2, +11.0% on Program Bench, and +31.5% on MLS Bench Lite, while spending ~30% fewer reasoning tokens.

Multi-file refactors 10+ languages Repo-scale 256K End-to-end tasks

🛠️

Production tech stack

Trained on backend, infra, performance, systems, security, frontend and ML/data tasks across 10+ mainstream languages.

🧩

Multi-file coherence

Holds an established architecture pattern consistently across a session - no drift between modules.

🐢

Rust · Go · Python and more

Reliable generalization across languages and scenarios: frontend, DevOps, performance optimization.

⚡

6× High-Speed Mode (soon)

Moonshot has announced a high-speed serving mode to address throughput on long agentic runs.

CODING kimi-k2.7-code

Refactor the auth module to use the repository pattern and add tests.

→ explore read 12 files · mapped service ↔ controller
→ plan introduce UserRepository + interface
→ coder patched 6 files · 18 unit tests added
✓ build green · 0 regressions

FORCED & PRESERVED THINKING

K2.7 Code always runs with reasoning on - thinking and preserve_thinking are forced and can't be disabled. Crucially, it carries full reasoning across turns, so it remembers why it made a decision, not just what it said. Recommended sampling: temperature 1.0, top-p 0.95.

Preserve Thinking Interleaved reasoning Less overthinking −30% tokens

🧵

Reasoning across turns

Pass back reasoning_content and the model keeps its prior chain-of-thought - ideal for multi-step coding.

🔁

Interleaved tool calls

Reasons between tool calls in a single turn, the same design as K2 Thinking - fewer round-trips.

📉

Efficiency gain

~30% lower reasoning-token usage vs K2.6 - less overthinking, lower cost on long runs.

🚫

No instant mode

There's no non-thinking path - every call reasons first, which is why token efficiency matters.

THINKING kimi-k2.7-code

Give me three random numbers.

〈reasoning〉I'll list five: 473, 921, 235, 215, 222 - return first three.〉
473, 921, 235

What were the other two you had in mind?

215 and 222 - preserved from my earlier reasoning.

AGENTIC MCP TOOL USE

This is K2.7 Code's standout. On MCP Mark Verified it scores 81.1 - beating Claude Opus 4.8 (76.4) - across real servers like Notion, GitHub, Postgres, Filesystem and Playwright. It also posts 76.0 on MCP Atlas. Think CI checks, ticket updates, and file edits in one loop.

MCP protocol GitHub · Postgres Playwright 81.1 MCP Mark

🔧

Multi-step tool calling

Native interleaved thinking + tool calls - chains reads, edits, and shell commands without per-step round-trips.

🤖

Kimi Code CLI agent

Pairs best with the open-source Kimi Code CLI - coder, explore, and plan sub-agents in isolated contexts.

🔗

OpenAI / Anthropic compatible

Drop-in with existing agent frameworks - Claude Code, Cline, Roo Code, OpenCode, Aider.

🛡️

Budget-safe runs

Set a daily spending cap and limit tool_choice to keep autonomous loops from over-spending.

TOOL USE · MCP kimi-k2.7-code

[MCP] connect: github · postgres · filesystem

[github] open PR #482 → 3 failing checks

〈reasoning〉 migration drift in users table

[postgres] diff schema → add index idx_users_email

[fs] write migration 0042_add_index.sql

[github] push · checks re-run → ✓ passing

✓ MCP Mark Verified 81.1 · Atlas 76.0

NATIVE MULTIMODAL

Unusual for a coding-first model: K2.7 Code accepts text, image, and video through a 400M-parameter MoonViT vision encoder. Drop in a screenshot of a UI bug, or hand it a recorded repro and a stack trace in one prompt. Video chat is an experimental feature on the official API.

Image input Video input MoonViT 400M Screenshot-to-fix

🖼️

Design-to-code

Send a mockup or screenshot; get a faithful implementation plus the fix for what's broken.

🎬

Video clip analysis

A watch_video_clip tool lets the model inspect specific seconds of a recorded repro.

📎

One-prompt context

Documentation, screenshots, and a recording can all share a single request.

🧾

png · jpeg · webp · gif

Standard image formats supported; video chat currently via the official Kimi API.

MULTIMODAL kimi-k2.7-code

[screenshot.png] The header overlaps the nav on mobile - fix it.

I see the issue: the sticky header lacks a z-index and the nav has no top offset below 580px. Patch: add z-index:300 to .nav and a padding-top on the drawer.

How to Use Kimi K2.7 Code API

Try Kimi Code

Open kimi.com/code - the terminal-first agent K2.7 Code is tuned for. Sign in and start a coding session in minutes, no local setup needed.

Get an API key

Create a key at platform.moonshot.ai (keys start with sk-). The API is OpenAI- and Anthropic-compatible. Use model kimi-k2.7-code.

Call with thinking on

Thinking and preserve-thinking are forced - read message.reasoning as well as content. Recommended temperature 1.0, top_p 0.95. Pass images or video for multimodal tasks.

Or self-host the weights

Download from HuggingFace and serve with vllm serve "moonshotai/Kimi-K2.7-Code", SGLang, or KTransformers. Then point Kimi Code CLI at your local endpoint.

Kimi K2.7 Code Benchmarks

Moonshot's published Kimi K2 7 benchmark table, comparing K2.7 Code against K2.6, GPT-5.5 and Claude Opus 4.8. Important: every figure here is first-party - as of mid-June 2026 there are no independent third-party numbers on standard public suites yet, so read them as vendor-reported and directional.

Benchmark	K2.7 Code Moonshot	Kimi K2.6 Moonshot	GPT-5.5 OpenAI	Opus 4.8 Anthropic
Coding
Kimi Code Bench v2	62.0	50.9	69.0	67.4
Program Bench	53.6	48.3	69.1	63.8
MLS Bench Lite	35.1	26.7	35.5	42.8
Agentic
Kimi Claw 24/7 Bench	46.9	42.9	52.8	50.4
MCP Atlas	76.0	69.4	79.4	81.3
MCP Mark Verified	81.1	72.8	92.9	76.4

Tested via Kimi Code CLI (thinking, temp 1.0, top-p 0.95, 262,144 ctx); GPT-5.5 in Codex xhigh, Opus 4.8 in Claude Code xhigh. Gains over K2.6: +21.8% Kimi Code Bench v2, +11.0% Program Bench, +31.5% MLS Bench Lite. Full methodology on the HuggingFace model card.

Download, Size & Requirements

For Kimi k2 7 code download, the weights live on HuggingFace as moonshotai/Kimi-K2.7-Code (mirror on ModelScope) under a Modified MIT license, with community GGUF quants for Ollama, LM Studio, Jan and llama.cpp.

MODEL SIZE

Architecture	Mixture-of-Experts
Total parameters	1T (~1.06T)
Active parameters	32B / token
Layers (incl. dense)	61
Experts · selected	384 · 8 (+1 shared)
Attention	MLA · 64 heads
Vocabulary	160K
Context length	256K (262,144)
Vision encoder	MoonViT · 400M
On-disk weights	~1.1T params · BF16

REQUIREMENTS

Engines	vLLM · SGLang · KTransformers
transformers	>=4.57.1, <5.0.0
FP16 inference VRAM	~2,308 GB (multi-node)
INT8 inference VRAM	~1,150 GB
INT4 inference VRAM	~577 GB (≈8× A100 80GB)
Quantization	Native INT4 (QAT)
API endpoint	platform.moonshot.ai
Model ID	kimi-k2.7-code

Rule of thumb: combined RAM + VRAM ≈ quantization file size. For most teams the API is cheaper than a purpose-built local rig.

Download on HuggingFace Kimi Code CLI on GitHub

Kimi K2.7 Code Pricing

Two ways to pay: per-token API access, or a Kimi Code membership. Kimi K2.7 Code pricing on the API is roughly 5× cheaper on input and 4–6× cheaper on output than the closed flagships it's benchmarked against.

Moderato

Entry Plan

$19/mo

Kimi Code access included

Get Moderato

Includes
✓ K2.7 Code in Kimi Code
✓ Base Kimi Code credits
✓ Forced thinking + tool use
✓ Multimodal input

Allegretto

Daily Driver

$39/mo

Key Features

💻

Long-Horizon Coding

Plans, edits, runs tools, and debugs across many steps for end-to-end task completion - built for multi-file refactors and complex software engineering, not single-shot answers.

🔌

Best-in-class MCP Tool Use

Scores 81.1 on MCP Mark Verified - ahead of Claude Opus 4.8 (76.4) - across Notion, GitHub, Postgres, Filesystem, and Playwright environments.

📉

30% Fewer Reasoning Tokens

Less overthinking than K2.6 means lower cost on long autonomous runs, where output tokens dominate the bill - efficiency that compounds with low API prices.

👁

Native Multimodal

Accepts text, image, and video via a 400M MoonViT vision encoder. Screenshot-to-fix and recorded-repro analysis in a single prompt.

🧵

Preserve Thinking

Forced reasoning persists across turns, so the model remembers why it made a decision - keeping architecture consistent over long coding sessions.

🔓

Open Weights (Modified MIT)

1T-parameter weights on HuggingFace and ModelScope. Self-host with vLLM, SGLang, or KTransformers; native INT4 keeps memory in check. Commercial use permitted.

Use Cases

🧩

Coding

Multi-File Refactors

Restructure a module, propagate a pattern across controllers and services, and add tests - coherently, across a 256K-token repo-scale context.

🔧

Tool Use

CI & DevOps Loops

Run CI checks, update tickets, edit files, and fix failing migrations in one agentic loop via MCP servers like GitHub and Postgres.

🖼️

Multimodal

Screenshot-to-Fix

Hand the model a UI screenshot or a recorded repro video plus a stack trace, and get a diagnosis and a patch in a single prompt.

🐢

Coding

Systems & Performance

Rust, Go, and Python systems work, infrastructure, and performance optimization - the production-stack tasks K2.7 was trained on.

🏢

Tool Use

On-Prem / Regulated Teams

Self-host the open weights for data-residency constraints, then drive them with Kimi Code CLI pointed at your local endpoint.

💸

Efficiency

Cost-Sensitive Agents

Long autonomous sessions where output tokens dominate the bill benefit twice: fewer reasoning tokens and cheaper per-token pricing.

Compare Models

Capability	Kimi K2.7 Code Moonshot AI	Kimi K2.6 Moonshot AI	GPT-5.5 OpenAI	Claude Opus 4.8 Anthropic
Architecture
Total parameters	1 Trillion	1 Trillion	Undisclosed	Undisclosed
Active / token	32B (MoE)	32B (MoE)	-	-
Context window	256K	256K	Large	1M
Open weights	✓ Modified MIT	✓	✗	✗
Multimodal (img/video)	✓	✓	✓	✓
Benchmarks (first-party)
Kimi Code Bench v2	62.0	50.9	69.0	67.4
MCP Mark Verified	81.1	72.8	92.9	76.4
MCP Atlas	76.0	69.4	79.4	81.3
Pricing (API, per 1M)
Input	$0.95	~$0.95	~$5.00	~$5.00
Output	$4.00	~$4.00	~$15.00	~$25.00
Cached input	$0.19	-	-	-

Benchmarks are Moonshot's own first-party results (June 12, 2026) and not independently verified. Competitor prices are approximate. Always confirm current numbers on official provider pages.

Limitations

📊

First-party benchmarks only

All published numbers come from Moonshot's own suites (Kimi Code Bench v2, MCP Atlas, etc.). No independent SWE-bench Verified, Terminal-Bench, or LiveCodeBench results exist yet.

Treat scores as vendor-reported · test on your own repo

🚫

No non-thinking mode

Thinking and preserve-thinking are forced on and can't be disabled. There's no cheap, no-reasoning path for trivial calls, so every request reasons first.

Lean on the 30% token-efficiency gain to offset

🖥️

Heavy to self-host

Even at INT4 it needs roughly 577GB of VRAM (≈8× A100 80GB); FP16 needs a multi-node cluster. Not practical on consumer hardware.

For most teams the API is cheaper than a local rig

🧱

Code variant only

There's no general-purpose "Kimi K2.7" or Instruct sibling at launch - it's tuned for engineering, not broad chat.

Use a general model for non-coding chat workloads

📏

256K context ceiling

Large and repo-friendly, but it trails the 1M-token windows of the Claude flagships for the very largest single-prompt corpora.

Chunk or summarize for >256K-token inputs

⏱️

Speed (for now)

Some users report earlier Kimi models felt slow. The promised 6× High-Speed Mode wasn't live at launch.

Watch for High-Speed Mode rollout

Advantages

🏆

Beats Opus 4.8 on tool use

On MCP Mark Verified, K2.7 Code scores 81.1 vs Opus 4.8's 76.4 - an open-source model out-pointing a closed flagship on agentic tool invocation.

📉

Token efficiency that compounds

~30% fewer reasoning tokens than K2.6, layered on top of already-low API prices - a real win for long autonomous coding runs.

🔓

Truly open weights (Modified MIT)

Unlike GPT-5.5 or Opus 4.8, K2.7 Code weights are public. Self-host, fine-tune, integrate - the most permissive frontier-adjacent coding stack.

📐

MoE efficiency: 1T total, 32B active

Trillion-parameter quality with the compute footprint of a 32B model - 384 experts, 8 selected per token, native INT4 quantization.

💰

Aggressive pricing for builders

At $0.95/1M input and $4.00/1M output, it's roughly 5× cheaper on input and 4–6× cheaper on output than Opus 4.8 and GPT-5.5.

🤖

Full-stack platform play

Model + open-source Kimi Code CLI + subscription tiers from $19/mo - the same model-plus-plan strategy as Claude Code, from the open side.

The Complete Guide

Everything developers are searching for about Moonshot AI's newest open-weight coding model, explained in depth - the code, the numbers, the hardware, and the community verdict.

Overview

What Is Kimi 2.7 Code?

Kimi 2.7 code - officially Kimi K2.7 Code - is Moonshot AI's newest open-weight large language model, released on June 12, 2026. It is not a general-purpose chatbot. It is a coding-focused, agentic model built directly on top of Kimi K2.6, and it is engineered for one job above all others: long-horizon software engineering. Where a conventional assistant answers a single question, K2.7 Code plans a task, edits files, runs shell commands, calls tools, inspects the results, and debugs its own work across many steps - the way a junior engineer works through a ticket rather than firing off a one-line reply.

Two claims define the release. The first is capability: across Moonshot's coding and agentic suites, K2.7 Code clearly outperforms its predecessor. The second is efficiency: it reaches that higher performance while spending roughly 30% fewer "thinking" tokens than K2.6. In a world where autonomous coding runs can burn millions of output tokens, a 30% reduction in reasoning overhead is not a footnote - it is a direct line item on the bill.

Under the hood it is a Mixture-of-Experts (MoE) design with 1 trillion total parameters but only 32 billion active per token, so it computes far less than a dense trillion-parameter model would on each forward pass. It is natively multimodal - accepting text, image, and even video input - which is unusual for a coding-first model. It runs exclusively in forced thinking mode (there is no "instant" path), and it ships under a permissive Modified MIT license. One naming caveat worth knowing: at launch there is no general-purpose "Kimi K2.7" or "Instruct" sibling - the Code variant is the whole story for now, tuned for engineering rather than broad chat.

Crucially, the model never ships alone. It is paired with Kimi Code, Moonshot's open-source terminal coding agent - the same model-plus-CLI strategy Anthropic runs with Claude Code. That pairing is the real headline: Moonshot is shipping a full coding platform, not just a set of weights.

Open Source

Kimi K2 7 Code on GitHub

Anyone researching Kimi k2 7 code github quickly discovers that the release has two open-source halves, and the GitHub side is just as important as the weights. The model is only half the story; the other half is Kimi Code CLI, Moonshot's terminal-first coding agent and a direct competitor to Claude Code and Gemini CLI. The CLI has been rewritten in TypeScript and is distributed via npm, and it runs autonomous, multi-step workflows entirely from the terminal: reading and editing code, executing shell commands, searching files, and fetching web pages.

What makes the agent more than a thin wrapper is its sub-agent architecture. Kimi Code CLI ships with built-in coder, explore, and plan sub-agents that each run in isolated contexts, so a large task can be decomposed - explore the codebase, plan the change, then write it - without the context bleed that plagues single-thread agents. Both the code repository and the model weights are released under the Modified MIT license, which is what keeps the self-host path open for commercial teams.

The community moved fast around the GitHub ecosystem. Within days of launch, integration guides and configuration repositories appeared for wiring K2.7 Code into the tools developers already use:

Claude Code - point it at the Moonshot endpoint and swap the model.
Cline and Roo Code - popular VS Code agent extensions.
OpenCode - start it and switch model with /models.
Aider - for quick edits, git-aware workflows, and model comparisons.

For local serving, the repositories document the standard one-liners: vllm serve "moonshotai/Kimi-K2.7-Code" or the SGLang launch server, both exposing an OpenAI-compatible endpoint. A practical tip surfaced repeatedly across these projects: configure a project daily spending budget in the Kimi Platform dashboard, and set tool_choice to auto or none to keep autonomous agent loops from quietly consuming excess credits. If you run K2.7 Code locally, you can even point Kimi Code CLI at your own instance - kimi config set api.base-url http://localhost:8000/v1 - for the full agent experience at zero per-token cost.

It's worth understanding why the CLI matters so much for this particular model. Generic wrappers pass conversation history between turns - they give the model what was said. Kimi Code CLI is built to expose K2.7 Code's preserve-thinking and native multi-step tool calling, so it gives the model what it was reasoning about. In practice that means a request like "find every API endpoint without rate limiting and add it" can resolve in a single model turn - explore, plan, patch, test - rather than six separate exchanges, each with its own round-trip. For complex tasks, that native multi-step execution translates into several times fewer API calls than a turn-by-turn tool like Aider would make, which is both faster and cheaper. The repeated community recommendation is to use Kimi Code CLI for complex, multi-file work where reasoning coherence matters, and to reach for a lighter tool only for quick one-off edits.

Performance

Kimi K2 7 Code Benchmarks

The Kimi k2 7 code benchmarks are the reason the release got so much attention. Moonshot's announcement leads with three headline deltas over K2.6: +21.8% on Kimi Code Bench v2, +11.0% on Program Bench, and +31.5% on MLS Bench Lite, alongside the roughly 30% reduction in reasoning-token usage. In raw scores, K2.7 Code lifts Kimi Code Bench v2 from 50.9 to 62.0, Program Bench from 48.3 to 53.6, and MLS Bench Lite from 26.7 to 35.1 - the last of which essentially catches up to GPT-5.5 in a single generation.

The single most striking result is on tool use. On MCP Mark Verified, which measures whether a model correctly invokes tools across real server environments, K2.7 Code scores 81.1 - ahead of Claude Opus 4.8's 76.4. For agentic, tool-driven workflows - CI checks, ticket updates, and file edits in one loop - an open-source model out-pointing a closed flagship is the genuine highlight of the launch.

It helps to understand what each benchmark actually measures:

Kimi Code Bench v2 - Moonshot's in-house suite of realistic software-engineering tasks across 10+ languages and a full production stack: backend, infrastructure, performance, systems, security, frontend, and ML/data.
Program Bench - 200 tasks that ask an agent to recreate a program's behavior from only a compiled binary and its documentation, judged against 248,000+ fuzz-generated tests, with no source or internet access.
MLS-Bench-Lite - the 30-task official subset of MLS-Bench, testing whether AI can invent generalizable ML methods, with a 5-hour exploration budget per task.
Kimi Claw 24/7 Bench - an in-house long-horizon agentic test across 17 professional scenarios and 610 evaluation points.
MCP Atlas and MCPMark-Verified - realistic tool-use suites spanning Notion, GitHub, Filesystem, Postgres, and Playwright.

One honest qualifier belongs on every benchmark conversation: all of these are Moonshot's own suites, run under its own harness (Kimi Code CLI in thinking mode at temperature 1.0, top-p 0.95, 262,144-token context, with GPT-5.5 in Codex xhigh and Opus 4.8 in Claude Code xhigh). They are best read as vendor-reported and directional, not independently confirmed.

The benchmark that arguably matters most for production isn't in the accuracy table at all - it's the token-efficiency result. K2.7 Code's roughly 30% reduction in reasoning-token usage versus K2.6 is the figure that quietly governs cost on real workloads, because for agentic coding the output (reasoning) tokens dominate the bill. Concretely: a long autonomous run that previously consumed around 2 million reasoning tokens now uses closer to 1.4 million, and critically, that efficiency gain does not come at the expense of quality - the accuracy scores went up at the same time. Stack the ~30% fewer tokens on top of an output price of $4.00 per million, and a heavy agentic session gets cheaper twice over: fewer tokens billed, and each token already priced well below the closed flagships. That compounding is the real economic story behind the headline accuracy deltas.

Independent Context

Kimi K2 7 Benchmark - Reading the Numbers

Because Moonshot's figures are first-party, the most useful thing a Kimi K2 7 benchmark guide can do is explain how to interpret them and what to watch for next. As of mid-June 2026, there are no independent third-party numbers for K2.7 Code on the standard public suites - no SWE-bench Verified, SWE-bench Pro, Terminal-Bench, LiveCodeBench, GPQA Diamond, AIME, or MMLU-Pro. That absence does not mean the model is weak; it means the public evidence is still arriving.

So how should you weigh the claims? First, treat the deltas over K2.6 as the most trustworthy signal - they were measured under identical conditions on the same harness, so the relative improvement is meaningful even if the absolute numbers later move on neutral suites. Second, give extra weight to the MCP tool-use results, because those map directly to the agentic work the model is actually sold for. Third, discount cross-vendor comparisons slightly: different models ran in different harnesses (Codex xhigh, Claude Code xhigh), and harness choice shifts results.

The fastest-moving independent signal right now is not a leaderboard at all - it is community testing. Within 48 hours of launch, creators published hands-on video reviews running K2.7 Code locally, clustered, and in the cloud against Claude, and developers shared head-to-head impressions on forums. Until neutral suites publish, that practical testing is the best proxy for real-world behavior.

The smart move isn't to trust or dismiss the vendor table - it's to run K2.7 Code on your own repository, with your own tasks. That's the only benchmark that pays your bills.

Watch for SWE-bench Verified, Terminal-Bench, and LiveCodeBench results from third parties over the coming weeks. When those land, you'll know whether K2.7 Code's first-party story holds up under neutral conditions - and they're the figures any serious adoption decision should ultimately wait for.

Architecture

Kimi K2 7 Code Size & Architecture

When people search Kimi k2 7 code size, they usually mean two different things: the parameter count and the on-disk footprint. Both matter, and they're not the same number. On parameters, K2.7 Code is a 1-trillion-parameter Mixture-of-Experts model that activates only 32 billion parameters per token. That sparsity is the entire trick: you get the representational capacity of a trillion-parameter model while paying roughly the compute of a 32B model on each forward pass.

The full architecture, straight from the official model card, looks like this:

61 layers total, including a single dense layer.
384 experts, with 8 selected per token plus 1 shared expert.
Attention hidden dimension of 7,168; MoE hidden dimension of 2,048 per expert.
64 attention heads using MLA (Multi-head Latent Attention).
SwiGLU activation; 160K vocabulary.
256K context length (262,144 tokens).
A MoonViT vision encoder adding 400M parameters for image and video input.

On disk, the published weights register at roughly 1.1T parameters in mixed precision - the HuggingFace files page lists BF16 alongside F32 and I32 tensors. The detail that makes K2.7 Code far lighter to run than a naive trillion-parameter model is its quantization design: the MoE weights are stored in native INT4 (via quantization-aware training) while attention stays in BF16. This isn't lossy post-hoc compression bolted on afterward; the model was trained with INT4 in mind, which is why high-quality INT4 inference loses almost nothing and why the practical memory bill is a fraction of the FP16 figure. In short: the parameter count is enormous, but the size that actually determines whether you can serve it is the quantized footprint - and that's where the next section comes in.

Distribution

Kimi K2 7 Code on Hugging Face

The canonical home for Kimi k2 7 code huggingface is the repository moonshotai/Kimi-K2.7-Code, with a mirror on ModelScope. The model card is the single source of truth for the architecture, the full benchmark table, the deployment notes, and the chat template - if you only read one external page about K2.7 Code, make it that one. The repo is tagged as image-text-to-text, ships as safetensors, and is released under the Modified MIT license.

HuggingFace also lays out every practical way to actually run the thing. With Transformers, you can load it directly - AutoModel.from_pretrained("moonshotai/Kimi-K2.7-Code", trust_remote_code=True) - or use the high-level image-text-to-text pipeline. For serving, the card documents vLLM (vllm serve "moonshotai/Kimi-K2.7-Code"), SGLang, and a one-line Docker Model Runner path (docker model run hf.co/moonshotai/Kimi-K2.7-Code).

For developers who don't have data-center hardware, the most important HuggingFace detail is the community quantizations. The model page links GGUF builds for llama.cpp, Ollama, LM Studio, and Jan - the realistic route to running K2.7 Code on more modest setups. The page also exposes hosted options: Inference Providers (such as Novita), a HuggingChat entry, and several community Spaces demoing the model. If you'd rather not download anything, those let you try it in the browser first.

A quick orientation for the repo's tabs: Model card for documentation, Files and versions for the raw safetensors and config, and Community for discussions and issues. Note the transformers version requirement - >=4.57.1, <5.0.0 - which trips up people who try to load the weights on an older install. K2.7 Code shares its architecture with K2.5/ K2.6, so existing deployment recipes for those models port over almost directly.

Hardware & Software

Kimi K2 7 Code Requirements

Here is where enthusiasm meets reality. The honest answer to Kimi k2 7 code requirements is that this is a demanding model to self-host, and for most developers the API will be the right call. Let's separate software from hardware.

Software is the easy part. K2.7 Code runs on three recommended inference engines: vLLM, SGLang, and KTransformers (Moonshot's own engine, with native INT4 support and the tightest integration with the K2 architecture). You need transformers >=4.57.1, <5.0.0. The API is OpenAI- and Anthropic-compatible, so most existing tooling works with nothing more than a base-URL and model-ID swap.

Hardware is where the trillion-parameter scale shows up. Because memory is the binding constraint, the precision you choose dictates the rig you need. Independent estimates for peak inference VRAM (weights, activations, and KV cache, with a little production headroom) come out roughly as:

FP16 (full precision): ~2,300 GB of VRAM - beyond a single 8-GPU node, so a multi-node cluster.
INT8 (Q8): ~1,150 GB - multiple high-end GPUs.
INT4 (Q4): ~577 GB - roughly 8× A100 80GB, rentable in the cloud from around $6/hour.

A useful rule of thumb carried over from the K2 line: your combined RAM + VRAM should roughly equal the quantization file size. With aggressive dynamic 2-bit GGUF quants, some people get the model running on around 350 GB of combined memory - a high-RAM DDR5 CPU build or a multi-GPU rig plus system RAM - but expect single-digit tokens per second and reduced quality at that extreme.

The pragmatic takeaway: self-hosting only makes economic sense if you push tens of millions of tokens a month, or your data genuinely cannot leave your machines. A purpose-built local rig for K2.7 Code costs more than a used car; a $0.95-per-million-token API call does not. If you do go local, native INT4 plus KTransformers is the sweet spot - and cloud GPU rentals are the sensible way to test before committing to hardware.

Community Verdict

Kimi K2 7 Code on Reddit & Developer Forums

The community reaction has been loud and, predictably, mixed. Scanning the Kimi k2 7 code reddit threads and other developer forums in the days after launch, a handful of themes recur consistently:

Praise for open-sourcing a near-frontier coding model. Many developers see K2.7 Code as the strongest open-source coding agent you can actually run yourself, and the MCP Mark Verified result against Opus 4.8 drew particular attention.
The efficiency angle landed. Teams running long autonomous sessions care more about the ~30% reduction in reasoning tokens than about a benchmark point or two - it shows up directly in their monthly spend.
Skepticism about speed. Some users found earlier Kimi models sluggish; the promised 6× High-Speed Mode is Moonshot's answer, but it wasn't live at launch, so the complaint stands for now.
Caution on the benchmarks. The more experienced commenters flag that the numbers are all first-party and explicitly want to see independent SWE-bench-style results before fully buying the comparisons.

A common workflow heuristic also emerged quickly across these discussions: use K2.7 Code for easy and standard-difficulty coding, and reserve the most expensive closed models for the genuinely hard problems. One widely-shared summary put it as using Kimi 2.7 for easy coding tasks, a closed flagship for standard work, and the very top-tier closed models only for ultra-hard coding.

Nearly frontier, open source, and a fraction of the price - that's the line that keeps coming up. The asterisk that follows it, just as reliably, is "show me the independent numbers."

In other words, the verdict is real enthusiasm tempered by a healthy "prove it on neutral ground" reflex - which is exactly the right posture for a brand-new model whose published numbers are all vendor-run. If you're evaluating it yourself, the forum consensus is unanimous on one point: spend an afternoon testing it on your own codebase before drawing conclusions from anyone else's benchmark, including Moonshot's.

The Bottom Line

Should You Use Kimi K2.7 Code?

Pulling every thread together: Kimi K2.7 Code is the most interesting open-source coding release of mid-2026. It is fast to adopt thanks to an OpenAI- and Anthropic-compatible API and the model ID kimi-k2.7-code; it is genuinely strong at agentic tool use, beating Claude Opus 4.8 on MCP Mark Verified; it is meaningfully cheaper than the closed flagships at $0.95 input and $4.00 output per million tokens; and it is backed by real open weights under a Modified MIT license that you can download and self-host. The asterisks are equally real: the benchmarks are first-party for now, the context window tops out at 256K, the hardware demands for local serving are steep, reasoning is always on with no cheaper non-thinking path, and raw speed is still catching up to the promised High-Speed Mode.

For a coding- and tool-heavy team, the calculus is simple. If you want near-frontier agentic performance at a fraction of closed-model cost, run K2.7 Code through the API or a Kimi Code membership starting at $19/month. If you have hard data-residency constraints and the hardware budget, the open weights let you keep the whole pipeline in-house. And if you only ever need the single best answer on the hardest problems regardless of price, you'll still reach for the closed flagships some of the time - which is precisely how most practitioners are already describing their day-to-day workflow. The one move that beats all the reading: grab a key, point Kimi Code CLI at it, throw a real multi-file task at it, and let your own repository be the benchmark.

Watch It In Action

Independent creators put Kimi 2.7 code through real coding tests within 48 hours of launch - local, clustered, in the cloud, and head-to-head against the closed flagships.

Kimi K2.7 Code - In-Depth Review (Local · Cloud · vs Claude)

Benchmarks, local deploy & head-to-head

K2.7 Code Is HERE - Best Open Coding Model Yet?

First look · technical breakdown · multimodal test

I Tested Kimi-K2.7-Code With 20 Prompts

Leaderboard stress test · is it worth the price?

FAQ

Is Kimi K2.7 Code free?

The weights are open under a Modified MIT license, so you can self-host at no licence cost. The hosted API is paid per token ($0.95 input / $4.00 output per 1M), and Kimi Code memberships start at $19/month.

What's the API model ID?

kimi-k2.7-code, served through OpenAI- and Anthropic-compatible endpoints at platform.moonshot.ai. Keys start with sk-.

How big is the model?

1 trillion total parameters (~1.06T), 32B active per token, 61 layers, 384 experts (8 selected + 1 shared), 256K context, with a 400M MoonViT vision encoder.

Can it really run locally?

Yes, but it's demanding - roughly 577GB of VRAM at INT4, and a multi-node cluster at FP16. Most developers are better served by the API unless they're at very high volume or have hard data-residency needs.

Does it support images and video?

Yes. K2.7 Code is natively multimodal via MoonViT; image input uses png, jpeg, webp, and gif, while video chat is an experimental feature on the official Kimi API.

Where do I download it?

HuggingFace (moonshotai/Kimi-K2.7-Code) and ModelScope, with community GGUF quantizations for Ollama, LM Studio, Jan, and llama.cpp.

Is there a non-thinking mode?

No. Thinking and preserve-thinking are forced on and cannot be disabled - every call reasons first.

What's the community saying on Reddit?

Reaction has been positive but measured: praise for open-sourcing a near-frontier coding model and for the MCP tool-use result, interest in the token-efficiency gain, and a recurring "show me independent benchmarks" caution since all current numbers are first-party.

KIMIK2.7 CODE