Comment by TeMPOraL
17 hours ago
> Ever since Mitchell Hashimoto mentioned the harness in February
What. The idea is as old as anyone can remember, and wrt. LLMs, it was known to be important since at least as early as ChatGPT being first released.
17 hours ago
> Ever since Mitchell Hashimoto mentioned the harness in February
What. The idea is as old as anyone can remember, and wrt. LLMs, it was known to be important since at least as early as ChatGPT being first released.
Yes, the concept itself is not new. Around 2022, people would usually have called it the orchestration layer.
But I think the term started being used closer to its current meaning around this point:
https://www.softwareimprovementgroup.com/blog/what-is-harnes...
In a way, the sequence was something like:
prompt engineering(23~4) -> context engineering(25) ->harness engineering(26)
At first, it was mostly understood as a correction or extension of prompt engineering. But the idea of “harness” as the layer that corrects, constrains, and operationalizes agents seems to have emerged much more clearly around 2026.
So yes, there is definitely some terminological confusion in the early phase. That is normal. New technical fields often begin with several competing names for almost the same layer, and only later does one term become stable.
My 2c:
The word harness brings the truth of LLMs back down to Earth.
it really felt like between 2018 and 2022ish like LLMs had this magical aura, like the orchestration layer was intelligent, maybe even recursive, beyond what simple functions could do. It was assumed that this was a solved problem. The word "orchestration" denoted it, the words we used were full of optimism. When you lift the veil, it really is just regex, and cool tricks sure, but it's a harness it's a utility, there's no magic here, there's realism.
Maybe the labs even had a part to play in this as well; attempting to make themselves look magical. I mean just look at the choice of name for "Mythos", it's about bringing back that feeling of myth and magic after we saw under the veil.
The reality is that the labs have produced magical models yes, but are locking them into ecosystems that leave a lot to be desired, and are easily reproducible, and essentially are cron jobs, regex.. things we've seen in traditional cloud for decades. It feels like an attempt to create a moat where there is none.
Maybe I'm wrong but this has been my impression
There were no LLMs between 2018 and 2022, at least not in the sense resembling today. The whole LLM frenzy started in late 2022.
2 replies →
Harness itself was a widely used term by at least the "[LLM] plays pokemon" trend, which was a year ago[1]. That was basically the term of art to use when arguing about just how much special treatment LLMs should get.
"harness engineering" is the term claimed by that article to have originated in February. It does seem obvious in retrospect and I don't remember an origination point, but there's at least one hn comment predating that in December[2] and it doesn't treat it as novel.
I will admit that my bias is against any self congratulatory buzzword fads (I'm still not over "MCP is the USB of LLMs" or whatever and that's been a year now too). "Who coined the term harness engineering?" -> who cares? It was already widely being done.
[1] https://news.ycombinator.com/item?id=46331242
I read your comment. I think we may be talking about slightly different contexts.
The Pokémon article you linked is basically about benchmarking. In that context, the harness functions as part of the benchmark setup: the controlled environment around the model, the available inputs, tools, and assistance.
The current usage of “harness,” at least in the agent engineering discussion, seems closer to a lower-level runtime layer, almost like an OS around the agent.
So I see this as a transition: from “harness” as a narrower benchmark/control-variable layer to “harness” as the broader operating environment of the agent.
That does not mean I think your point is wrong. With topics like this, the interpretation depends on which part of the lineage one emphasizes. The first appearance of the idea may go back to 2022 or earlier, while the usage that looks closer to the current meaning may have emerged at a different point.
I am probably giving more weight to the SIG article, while you are giving more weight to a different point in the lineage. Both seem reasonable to me.