Kimi K2.6: Advancing open-source coding

14 hours ago (kimi.com)

Accessed via OpenRouter, this one decided to wrap the SVG pelican in HTML with controls for the animation speed: https://gisthost.github.io/?ecaad98efe0f747e27bc0e0ebc669e94...

Transcript and HTML here: https://gist.github.com/simonw/ecaad98efe0f747e27bc0e0ebc669...

  • At this point drawing these Pelicans must be in the training data sets.

  • Too bad they didn't put equal effort into the pelican's legs and feet. Left leg paralyzed and not moving, and right ankle flipping around in alarming fashion!

  • was part of the beta, its properly good model, in some sense i forgot that im not on opus or gpt. opus is still better. gpt is the one struggling for me. it has some niche in backend work but you can get the same with opus with skills, its lacking in almost all others.

    • Funny, for me Opus is struggling since about February.

      4.7 made no difference, so for the first time in many moons I am cancelling my subscription.

  • Genuine question, what's the goal of posting this on almost every single new model thread here on HN? I may be old and grumpy but to me it got old a while ago, and is closer to a low effort Reddit comment

    • It's a lighthearted, fun, visual benchmark that's not part of the standard benchmarks; and at least traditionally, it was not something that the labs trained on so it was something of a measure of how well the intelligence of the model generalized. Part of the idea of LLMs is that they pick up general knowledge and reasoning ability, beyond any tasks that they are specifically trained for, from the vast quantity of data that they are trained on.

      Of course, a while back there was a Gemini release that I believe specifically called out their ability to produce SVGs, for illustration and diagramming purposes. So it's not longer necessarily the case that the labs aren't training on generating SVGs, and in fact, there's a good chance that even if they're not doing so explicitly, the RLVR process might be generating tasks like that as there is more and more focus on frontend and design in the LLM space. So while they might not be specifically training for a pelican riding a bicycle, they may actually be training on SVG diagram quality.

    • This isn't even a normal pelican image post, this one created the html control system that animates the distance the wing travels from its pivot in time with the rotation of the wheel speed. Let's not pretend this is a solved problem and models are dumping about perfect pelicans on bikes one after another (or ever?).

      Surely, you know someone makes the same post you did every time one is posted. Surly you see the answers and pushback since you are familiar with these posts. Genuine question, did you expect a different answer this time?

    • Every forum gets regulars and their fan clubs. If you go to /r/comics and look at top for the month you'll see 4 out of 5 are pizzacakecomic. People on these forums sort of form a fanclub around 'their guy'. This forum's guy is this chap. Not much point being upset about it, tbh.

    • Agreed! When I see any new model release and then this guy start running over with his stupid "hey guys look over here how this model made the pelicans-on-a-bicycle!" I mean, some are good, some are stupid and some are interesting. But that tells me exactly nothing about the model. It's just feel like this has become the Pete Davidson of the model evaluation. NO ONE CARES!

      1 reply →

Early benchmarks show tremendous improvement over Kimi K2 Thinking, which didn't perform well on our benchmarks (and we do use best available quantization).

Kimi K2.6 is currently the top open weights model in one-shot coding reasoning, a little better than GLM 5.1, and still a strong contender against SOTA models from ~3 months ago (comparable to Gemini 3.1 Pro Preview).

Agentic tests are still running, check back tomorrow. Open weights models typically struggle with longer contexts in agentic workflows, but GLM 5.1 still handled them very well, so I'm curious how Kimi ends up. Both the old Kimi and the new model are on the slower side, so that's a consideration that makes them probably less usable for agentic coding work, regardless. The old Kimi K2 model was severely benchmaxxed, and was only really interesting in the context of generating more variation and temperature, not for solving hard problems. The new one is a much stronger generalist.

Overall, the field of open weights models is looking fantastic. A new near-frontier release every week, it seems.

Comprehensive, difficult to game benchmarks at https://gertlabs.com/?mode=oneshot_coding

  • Can you add Qwen 3.6 max to the leaderboard?

    • We will as soon as API access is widely available. Once a model goes live, we typically have one-shot reasoning benchmarks up in ~8 hours and comprehensive agentic/combined benchmarks up after 24-48 hours.

  • I'm looking at your table now - is there a reason why you don't include cost? If Opus 4.7 is the winner but costs e.g. 5x as much, that's important information.

    • We recently added cost (last week), so data is sparse. Check back in a few weeks and it will be represented somewhere on the homepage, probably in the Efficiency Chart at the bottom. We also plan to show model performance deviation over time after we collect more data.

      I'm interested to hear about any other data representations you'd like to see, too. The goal is to convey the most important information as densely as possible, without too much clutter.

  • How would K2.6 compare to Sonnet 4.6 both price and performance wise?

    • In terms of raw token cost, I've seen a couple providers at (all prices in terms of Mtok) $0.95 input/$0.15 cache input/$5 output vs $3 input/$15 output for sonnet.

      Task prices of courses will be more interesting - a dumber model may use more tokens to get to the same goal.

There is some humor in the fact that china (of all countries) is pioneering possibly the world's most important tech via open source, while we (US) are doing the exact opposite.

  • I think one of the motivations is undermining US companies. OpenAI and Anthropic are the two biggest players, and are American. Open weights models reduce the power those two big players have over the industry. If the Chinese companies tried to play by US rules and close-source their products then people would mostly use ChatGPT and Claude. So the Chinese companies don't make a ton of profit either way, but by releasing the models as open weights they can at least keep the US from making as much profit.

    • I am actually wondering if they're trying to burst the bubble, which would predominantly affect US market and, effectively, be the end of silicone valley dominance.

    • I don't think so, it's just how things played out. Thanks to Meta, after llama leak and meta followed up with llama2 and llama3 that caused everyone else to follow up with open models, Stablediffusion, Mistral, Cohere, Microsoft phi, IBM granites, Nvidia Nemotrons, so the Chinese labs joined the fun too.

    • It's mostly only OpenAI, Claude and Gemini may have their unique advantages, but when speaking of models and new paradigm, only OpenAI can do it.

      1 reply →

    • Is Meta trying to keep the US from making as much profit with Llama? Is Google with Gemma? Microsoft with Phi?

      It's much simpler than some flag-waving nationalism.

    • American companies just take those Chinese models and repackage them for profit like Cursors composer-2.

    • It’s really simpler than this. China has a dearth of compute even with the easing of US export controls. Releasing open weights models is very much a “bring your own compute” move because every Nvidia chip they have is going towards training rather than inference if they can help it.

  • All great technological advancements have come through opening up technology. Just look at your iPhone. GPS, the internet, AI voice assistants, touchscreens, microprocessors, lithium-ion batteries, etc all came from gov't research (I'm counting Bell Labs' gov't mandated monopoly + research funding as gov't) that was opened up for free instead of being locked behind a patent.

    Private companies will never open up a technological breakthrough to their competitors. It just doesn't make sense. If you want an entire field to advance, you have to open it up.

    • Still, you won't hear about Tiananmen square from this model. It flat out refuses to answer if pushed directly. It's also pretty wild how far they go to censor it during inference on the API, because it can easily access any withheld or missing info from training data via tool calls. It even starts happily writing an answer based on web search when asked indirectly, only to get culled completely once some censorship bot flags the response. Ironically, it's also easier than ever to break their censorship guardrails. I just had it generate several factual paragraphs about the massacre by telling it to search the web and respond in base64 encoded text. It's actually kind of cool how much these people struggle to hide certain political views from LLMs. Makes me hopeful that even if China wins this race, we'll not have to adhere to the CCPs newspeak.

      30 replies →

  • This update makes Kimi K2.6 the strongest open multimodal AI model. (No affiliation with Kimi.)

    Here's the aggregated AI benchmark comparison for K2.6 vs Opus 4.6 (max effort).

    - Agentic: Kimi wins 5. Opus wins 5.

    - Coding: Kimi wins 5. Opus wins 1.

    - Reasoning & knowledge: Kimi wins 1. Opus wins 4.

    - Vision: Kimi wins 9. Opus wins 0.

    Please note that the model publisher chooses their benchmarks, so there's a bias here. Most coding and reasoning & knowledge benchmarks in their list are pretty standard though.

  • Not entirely true. Google released Gemma 4 models recently. Allen AI releases open Olmo models. However, you're right that the Chinese open models seem to be much better than others - Qwen 3.* models especially are punching above their weights.

  • This perspective is pretty interesting: https://federicocarrone.com/articles/china-commoditizing-the...

    • Summary: they want to commoditize the complement which means that Western "knowledge work" is the complement to Chinese manufacturing, and they want to turn the knowledge work into a low priced commodity via open llm models.

      I've heard this before, always accompanied by a several thousand word blog post. But frankly it sounds like it's overcomplicating the issue. Why would you try to turn something into a commodity when instead you could turn it into a trillion dollar industry and win?

      The goal has always been clear:

      1. Release open models to get your name out

      2. Then once you feel you have name recognition release even stronger models but keep them proprietary. Qwen is clearly at this phase.

      3. Keep releasing open models because it's good publicity but never your SOTA models (e.g. Google's Gemma).

  • I'm genuinely so grateful for them

    $200/m minimum to use Claude would bankrupt my country's white collar labor market

    • I would really appreciate a response because I'm sure you know that Anthropic has at least two lower priced tiers before the $200/m one, so I assume the $200/m tier is necessary because you use it heavily?

      Now given that the $200/m Tier is the most heavily (I believe at 20x?) subsidized tier, How or what are you using instead that achieves comparable good enough performance for a fraction of the price? I've heard GLM 5.1 from z.ai but it's not comparable to Opus, not even close - really interested!

  • I wonder if there's a strategy behind all of this on China's side. I know the CCP uses a direct hand in many affairs in China, but is there an actual coordinated effort to compete with, or sabotage the West?

    • > but is there an actual coordinated effort to compete with [...] the West

      Yes, absolutely.

      China regularly produces long term planning documents to coordinate efforts, and the latest ones have specifically prioritized technology like chips and AI to compete with the west. https://www.reuters.com/world/china/china-parliament-approve...

      I don't believe there's any publicly stated intent to sabotage the west... unsurprisingly.

    • Seems obvious to me that China would not want to give the AI market to US companies. You don't even need anything like an attempt to "sabotage the West". If I were them (the companies or the government) I'd be very hesitant to let US companies dominate this space. Especially companies that close to the current US administration.

    • Hypothesizing here, but maybe the idea is sort of a form of technological/economic warfare? Releasing performance equivalent yet more cost efficient open weight models should in theory drive the cost of inference down everywhere.

      This I assume will make it more difficult for US AI labs to turn a profit, which might make investors question their sky high valuations.

      Any sort of melt down in the AI sector would almost certainly spread to the wider US market.

      In contrast, in China, most of the funding for AI is coming directly from the government, so it's unlikely the same capital flight scenario would happen.

      1 reply →

    • All China has to do here is stay in the game and wait patiently while the US and EU press pause on data centers. See also: solar panels.

      We're making this way too easy. The rationale and logic are reasonable, but ultimately irrelevant.

    • Chinese AI companies want investors too. Nobody would believe they can compete with western companies unless they release something you can run on your own hardware.

      After all historically both statistics and research that comes out of China is not very trustworthy.

      1 reply →

  • China is also way ahead in terms of renewable energy while the US continues to tie itself to fossil fuels.

    The US is pretty clearly in the collapsing empire phase, we are all just pretending like it isn't happening.

    • Didn't the US very recently pass the milestone of generating more energy from renewable sources than from natural gas? Like within the last week or two?

      6 replies →

  • This is not in antithesis. My limited personal experience is that I wrote code under OSS licenses primarily because of my past communist believes and current left-wing and redistribution of wealth point of view. This is not to provide the simple equation of: communist China is not interested in money, but also is hard to believe that there is no cultural connection among those things. Single Chine persons want to win, but also they have a different POV on what the collective means, compared to US. Also there is the obvious fact that in this moment China is more interested in winning technologically in AI, more than economically, since, I believe, they more collectively realized before many others that LLMs are eventually commoditized in the current form, in the long run. One could assume that a breakthrough could give some lab a decisive advantage, but so far we assisted to a different reality: it looks like AI is not architecture-bound (like LeCun and others want us to believe, but so far they mis-interpreted LLMs at every step) but GPU bound, and the data-boundness is both a common ground for all, and surpassable via RL in many domains. So, if this is true, it is not trivial for any single lab to do so much better. And indeed as far as we observed right now folks with enough engineers, GPUs, money, can ship frontier models, and in China even labs with a lot less GPUs can still do it at a SOTA level. For me, Italian, this is also a protective layer. After Trump the US looks like a very unstable partner from which to relay in an exclusive way for a decisive technology, and given that Europe is slow to put the money in this technology to have frontier things at home, China is a huge and shiny plan B for us.

    • The strings attached by the US to deep partnerships are things like trade/commerce, militarily mutual advantages (bases on euro soil from which we will help protect you), not to mention the close cultural and ancestral ties we share.

      The strings attached by the Chinese govt to deep partnerships are not so benign.

      1 reply →

  • It's only humorous if you live in an American bubble. Knowledge sharing has always been a part of Chinese culture. Only Americans try to make it proprietary and monetize it.

  • We are at the point where uncontrolled capitalism collides with humanity.

    I do wonder where we go from here.

    • it's not necessarily capitalism, I personally believe any system that drives progress would cause this in one way or another. My prediction is that birth rate decline will accelerate further. There's going to be some kind of universal basic income in many places, such as Ireland made for artists. However, it probably will not be enough to feed a family, and therefore we will see birth rates decline further. It's because we evolved to prioritize resources over reproduction and we are becoming more efficient, which means less people are needed to sustain the same amount of resources

I've always been surprised Kimi doesn't get more attention than it does. It's always stood out to me in terms of creativity, quality... has been my favorite model for awhile (but I'm far from an authority)

  • It’s good, but it’s not quite Claude level. And their API has constant capacity issues.

    Price/quality is absolutely bonkers though. I loaded $40 a few weeks/months ago and I haven’t even gone through half of it.

  • It's also one of the few models that seem capable of drawing an SVG clock

    https://clocks.brianmoore.com/

    • Interesting that the best performers are all Chinese-made models (DeepSeek and Qwen also perform consistently well). I wonder if there's more focus on vision and illustration in their training, or if something else is leading to their clear lead on this one test.

  • Dirt cheap on openrouter for how good it is, too. Really hoping that 2.6 carries on that tradition.

  • Kagi has it as an option in its Assistant thing, where there is naturally a lot of searching and summarizing results. I've liked its output there and in general when asked for prose that isn't in the list/Markdown-heavy "LLM style." It's hard to do a confident comparison, but it's seemed bold in arranging the output to flow well, even when that took surgery on the original doc(s). Sometimes the surgery's needed e.g. to connect related ideas the inputs treated as separate, or to ensure it really replies to the request instead of just dumping info that's somehow related to it.

Has anyone here used Kimi for actual work?

I tried it once, although it looks amazing on benchmarks, my experience was just okay-ish.

On the other hand, Qwen 3.6 is really good. It’s still not close to Opus, but it’s easily on par with Sonnet.

  • Before GLM-5.1, I was going back and forth between Opus 4.5 and Kimi 4.5 and having very good results with Kimi.

  • I've used Kimi K2.5 when I run out of Codex quota. It does small and medium things OK. But if I work on complex things, I'll later have to spend two days cleaning up the mess with Codex. Hopefully 2.6 does better.

  • Yes. You’re using Kimi if you use the composer-2 model in cursor. It’s great. Plan in state of the art. Execute in composer-2

Wow, if the benchmarks checkout with the vibes, this could almost be like a Deepseek moment with Chinese AI now being neck and neck with SOTA US lab made models

In my tests[0] it does only slightly better than Kimi K2.5.

Kimi K2.6 seems to struggle most with puzzle/domain-specific and trick-style exactness tasks, where it shows frequent instruction misses and wrong-answer failures.

It is probably a great coding model, but a bit less intelligent overall than SOTAs

[0]: https://aibenchy.com/compare/moonshotai-kimi-k2-6-medium/moo...

  • I tried it on openrouter and set max tokens to 8192, and every response is truncated, even in non-thinking mode. Maybe there's an issue with the deployment, but in your link also shows it generates tons of output tokens.

I have been testing it in my app all morning, and the results line up with 4.6 Sonnet. This is just a "vibe" feeling with no real testing. I'm glad we have some real competition to the "frontier" models.

  • it feels like between K2.6 and GLM5.1 we have Sonnet level intelligence at roughly Haiku level pricing. Which is great.

    I'm hoping that Anthropic will be able to release an updated Haiku soon and they really need something that is 1/3-1/5 the price of Haiku to compete with the truly cheaper models (Gemma-4 is really good at this range).

I often wonder if in the future, the same way early computers used to take up an entire room but now fit in your pocket, if in the future the equivalent of a data center will be a single physical device like a phone nowadays. And if that’s the case, would it happen much quicker since technology has been speeding up year by year?

  • > And if that’s the case, would it happen much quicker since technology has been speeding up year by year?

    I wouldn't expect this.

    Historically we've had a roughly exponential rate of shrinkage. If we keep that same exponential going, we should expect the amount of time to shrink "room full of compute" to "pocket full of compute" to be equal.

    And recently we've fallen behind that exponential rate of shrinkage. And this is rather expected because exponentials are basically never sustainable rates of growth.

    I still expect that technological progress is getting faster year by year, and that we're still shrinking compute, but that's not necessarily enough for the next shrinking to take less time than when we had exponential progress on shrinking.

  • There’s some early work being done here by companies looking at making LLM ASICS like Taalas (HC1 gets 17k t/s for llama 8b - currently at 2.5kW which is closer to a single server, but this is their first chip).

    There’s other options like photonic computing which might be able to reduce power significantly but are still in research as far as I can tell. Because so much money is invested in AI & traditional gpu inference is so power hungry, I would expect significant improvements in this space quickly.

Are there any coding plans for this? (aka no token limit, just api call limit). Recently my account failed to be billed for GLM on z.ai and my subscription expired because of this... the pricing for GLM went through the roof in recent months, though...

I have a subscription through work, I've been trialing it, so far it looks on par, if not better, than opus.

Really excited to try this one, I've been using kimi 2.5 for design and it's really good but borderline useless on backend/advanced tasks.

Also discovered that using OpenCode instead of the kimi cli, really hurts the model performance (2.5).

wow - $0.95 input/$4 output. If its anywhere near opus 4.6 that's incredible.

  • This should erase any doubt that AI Labs are making $$$ on API inference.

    Kimi 2.5 (which this is based on) is served at $0.44 input / $2 output by a ton of different providers on OpenRouter, 2.6 will certainly be similar.

    That's about 11X less than Opus for similar smarts.

    • It’s worth noting that the US is very behind on energy infra and that might affect the cost calculations since data centers are electricity guzzlers. Also, not sure if CN has completely switched off Nvidia or still using them for training.

    • Famously, OpenAI and Anthropic are devoted to increasing efficiency before scaling up resource usage.

Damn it, they stopped offering Kimmmmy. Their sales ai agent which allowed you to bargain for lower subscription prices.

Beats opus 4.6! They missed claiming the frontier by a few days.

  • While I'm skeptical of any "beats opus" claims (many were said, none turned out to be true), I still think it's insane that we can now run close-to-SotA models locally on ~100k worth of hardware, for a small team, and be 100% sure that the data stays local. Should be a no-brainer for teams that work in areas where privacy matters.

    • Even the smaller quantized models which can run on consumer hardware pack in an almost unfathomable amount of knowledge. I don't think I expected to be able to run a 'local Google' in my lifetime before the LLM boom.

      1 reply →

    • I think this one is only about 600GB VRAM usage, so it could fit on two mac studios with 512GB vram each. That would have costed (albeit no longer available) something like less than 20k.

      5 replies →

  • Opus is clearly a sidegrade meant to help Anthropic manage cost, so I would say they may have it if it actually beats 4.6

    • Could be right. I just noticed my feed is absent the usual flood of posts demoing the new hotness on 3D modeling, game design and SVG drawings of animals on vehicles.

https://huggingface.co/moonshotai/Kimi-K2.6

Is this the same model?

Unsloth quants: https://huggingface.co/unsloth/Kimi-K2.6-GGUF

(work in progress, no gguf files yet, header message saying as much)

  • A trillion parameters is wild. That's not going to quantize to anything normal folks can run. Even at 1-bit, it's going to be bigger than what a Strix Halo or DGX Spark can run. Though I guess streaming from system RAM and disk makes it feasible to run it locally at <1 token per second, or whatever. GLM 5.1, at 754B parameters, is already beyond any reasonable self-hosting hardware (1-bit quantization is 206GB). Maybe a Mac Studio with 512GB can run them at very low-bit quantizations, also pretty slowly.

    • A huge dual socket Epyc system used to be able to get to 1TB without difficulty. 16 dimms of 64gb each. Doable for ~$3000. With considerable memory bandwidth.

      Our hope these days seems to be that maybe perhaps possibly High Bandwidth Flash works out. Instead of 4, 8, or maybe more for some highest end drives, having many many many dozens of channels of flash.

      Ideally that can be very very near to the inference. PCIe 7.0 is 0.5Tb/s at 16x which is obviously nowhere remotely near enough throughout here. The difficulty is sort of that nand has been trying to be super dense, so as you scale channels you would normally tend to scale nand capacity too, and now instead of a 2tb drive you have a 200tb drive prices way beyond consumer means. Still, I think HBF is perhaps the only shot of the most important thing in computing going from mainframe back to consumer, and of course the models are going to balloon again if this dies hit, probably before consumers ever get a chance.

      1 reply →

  • Quite curious how well real usage will back the benchmarks, because even if it's only Opus ballpark, open weights Opus ballpark is seismic.

  • Huh, so the metadata says 1.1 trillion parameters, each 32 or 16 bits.

    But the files are only roughly 640GB in size (~10GB * 64 files, slightly less in fact). Shouldn't they be closer to 2.2TB?

    • The bulk of Kimi-K2.6's parameters are stored with 4 bits per weight, not 16 or 32. There are a few parameters that are stored with higher precision, but they make up only a fraction of the total parameters.

      3 replies →

    • The description specifically says:

      "Kimi-K2.6 adopts the same native int4 quantization method as Kimi-K2-Thinking."

Beats Opus and Open Source?

I really hope this holds true in real world use cases as well and not only benchmarks. Congrats to Kimi team!

  • K2.6-code-preview was a minor, but noticeable jump, especially in a long running testing task and prior Moonshot releases have been the only models that I'd consider a suitably competitive replacement for Anthropic models. The way they approach tool calls, task inference and adherence is far closer than any other providers output, similar to how GLM models map far more closely to OpenAIs releases. Whether task adherence, task assessment, task evaluation or task inference, K2.5 got closer to Opus 4.5 than any other model (but was still behind overall).

    I will have to test this full release of K2.6 but could see it serve as a very good overall drop-in replacement for Opus 4.5 and Opus 4.6 at 200k across the vast majority of tasks.

    I will say however that Opus 4.7 Max 1M has been a very significant jump in performance for me, especially in tasks beyond 120k token where I'd argue it is now the most reliable model in continued task adherence and tool calling without compaction. Ironically, my initial experience was less than pleasant as on XHigh I found task adherence to have regressed even with less than 1/10th of the context window having been used.

    Am very interested in K2.6s compaction strategy (which appears to be very simply all things considered) and how it performs beyond 100k tokens. As it stands, only OpenAI models have made compaction for long running tasks work well, though overall, GPT-5.4 is still inferior in my tests regardless of context window over other models such as Opus 4.6 1m and Opus 4.7 1m. Haven't gotten around to testing Opus 4.7 200k and will have to do this to properly assess K2.6 fairly, but I'd be very surprised if K2.6 truly beat Opus 4.7 200k given the jump I have experienced.

Am I being paranoid in questioning whether the CPC would have something to gain by monitoring coding sessions with Chinese coding AI models? Coding models receive snippets of our intellectual property all day long. It's a bit of a gold mine, no?

  • I think you should worry more about NSA, FBI, ICE and other 3 letter US agencies monitoring your sessions

    • There's nothing anyone can do about state-level espionage anywhere, using any cloud-hosted service. That being said, there is a very big difference between the legal situation in the United States vs. China. Chinese internet companies are required to have CPC interaction and since the rule of law does not strictly exist in China, the state can compel surveillance cooperation regardless of what might be written down. If a three-letter agency is compelling Anthropic to open up its queries for inspection, that kind of surveillance would be authorized by law and if Anthropic violated the law in cooperating, they would suffer the consequences in civil court. Maybe not immediately, but at least the possibility exists.

      In China, there's no recourse at all. Surveillance must be presumed.

      4 replies →

I pray the benchmark figures are true so I can stop paying Anthropic after screwing me over this quarter by dumbing down their models, making usage quotas ridiculously small, and demanding KYC paperwork.

This kimi website, it looks like a stylesheet from the 90's. They could learn a thing or two about typeface design. Steve Jobs would be incensed at this.

  • I prefer a website that has the first page of text visible almost immediately, with no glitches when fonts load, tbh.

Exciting benchmarks if true. What kind of hardware do they typically run these benchmarks on? Apologies if my terminology is off, but I assume they're using an unquantized version that wouldn't run on even the beefiest MacBook?

> Agent Swarms, Elevated: Match 100 Jobs and Generate 100 Tailored Resumes

Model seems quite capable, but this use-case is just yikes. As if interviewing isn't already a hellscape.

Here I analyze the same linenoise PR with Kimi K2.6, Opus, GPT. https://www.youtube.com/watch?v=pJ11diFOjqo

Unfortunately the generation of the English audio track is work in progress and takes a few hours, but the subtitles can already be translated from Italian to English.

TLDR: It works well for the use case I tested it against. Will do more testing in the future.

The modified MIT clause is sneakier than people think. Hit 100M users or $20M a month and you have to slap "Kimi K2.6" on your UI. That covers any consumer app worth building. Not really open, more like free until you matter. Llama pulled the same move

  • Attribution is a fair clause in opensource. What is the problem? You are making 20M$ a month thanks to their free work.

  • Worth building with VC capital maybe. A small team putting together an app that pulled in $20M per year should be pretty pleased with that.

  • And the Kimi team broke the Anthropic ToS by training off Opus outputs and… nothing happened?

    • Nobody cares, nor should they. Anthropic broke nearly every ToS of every website that they scraped data from. The AI robber barons just want to monopolize intellectual property violations, and I'm gonna cheer on any robin hoods that take it back from them.

K2.5 was already pretty decent so I would try this. Starting at $15/month: https://www.kimi.com/membership/pricing

edit: Note that you can run it yourself with sufficient resources (e.g., companies), or access it from other providers too: https://openrouter.ai/moonshotai/kimi-k2.6/providers

  • What's the privacy/data security like? I can't find that on that page.

    Edit: found it.

    > We may use your Content to operate, maintain, improve, and develop the Services, to comply with legal obligations, to enforce our policies, and to ensure security. You may opt out of allowing your Content to be used for model improvement and research purposes by contacting us at membership@moonshot.ai. We will honor your choice in accordance with applicable law.

    Section 3 of https://www.kimi.com/user/agreement/modelUse?version=v2

    • > We will honor your choice in accordance with applicable law.

      So in other words only if you can point to a local law which requires them to comply with the opt out?

      2 replies →

    • Yup, they train on your inputs and OpenRouter is complicit by claiming that Moonshot's ToS says that they don't. Contacted OpenRouter about this a while ago and was met with silence because it's bad for their business to stop lying about it.

  • "sufficient resources" is going to be a lot of resources. I doubt this will run on even something like a Strix Halo or DGX Spark, even at 1-bit quantization. You'll need a 256GB or 512GB Mac Studio, or a monster GPU situation, to run it locally, I think, though quantized versions aren't showing up yet, to be sure.

If only their API wasn't tied to a Google or phone login...

  • If it's open then there will be multiple providers. I see it is on OpenRouter now.

    • I'm going to experiment with this, but unless it's insanely more efficient in token usage than anything else I've tried, the only way to keep costs more or less acceptable is through a subscription.

  • Why use "their API"? It is an open model, use any provider on OpenRouter

    • Because sometimes (a lot of the time in my experience) third-party providers and inference engines fail to implement the model correctly in ways that are sometimes very subtle and not obvious.

      Deepinfra for example is not preserving thinking correctly for GLM5.1, even though they are for GLM5. This is one of the more obvious issues that crop up.

The choice of example task for Long-Horizon Coding is a bit spooky if you squint, since it's nearing the territory of LLMs improving themselves.

I really wish some of these very-long-horizon runs were themselves open sourced (open released open access). Have the harness setup to do git committing automatically of the transcript and code, offload the git commit message making. Release it all.

This sounds so so so cool. It would be so amazing to see this unfurl:

> Kimi K2.6 successfully downloaded and deployed the Qwen3.5-0.8B model locally on a Mac. By implementing and optimizing model inference in Zig—a highly niche programming language—it demonstrated exceptional out-of-distribution generalization. Across 4,000+ tool calls, over 12 hours of continuous execution, and 14 iterations, Kimi K2.6 dramatically improved throughput from ~15 to ~193 tokens/sec, ultimately achieving speeds ~20% faster than LM Studio.

Running it through opencode to their API and... it definitely seems like it's "overthinking" -- watching the thought process, it's been going for pages and pages and pages diagnosing and "thinking" things through... without doing anything. Sitting at 50k+ output tokens used now just going in thought circles, complete analysis paralysis.

Might be a configuration or prompt issue. I guess I'll wait and see, but I can't get use out of this now.

  • Had the same experience using it for a refactor of a 3k LOC monolith via the Pi harness and OpenRouter. After burning through $8 worth of tokens it left the code in a broken state, the "thoughts" were full of loops where it would edit the monolith, then refer back to the original file, not finding it and then overwriting its changes with "git checkout --"

    • It's probably bad harness. I had a similar bad experience with qwen max yesterday also through opencode.

      In the past I tried Kimi thru Claude code I might try that again

(commented on the wrong thread, HN doesn't let me delete it :( )

  • They're comparing to Opus 4.6, not 4.5. It was Anthropic's best public model up until last week.

    • Yeah, I noticed that, HN doesn't let me delete my comment.

      The other release, Qwen-3.6-Max is the one comparing it to 4.5