Comment by devonkelley
10 hours ago
Interesting discussion, but I think this focuses too much on the "did the agent have the right context?" question and not enough on "did the execution path actually work?"
We've found that even with optimal context loading - whether that's AGENTS.md, skills, or whatever - you still get wild variance in outcomes. Same task, same context, different day, different results. The model's having a bad afternoon. The tool API is slow. Rate limits bite you. Something in the prompt format changed upstream.
The context problem is solvable with engineering. The reliability problem requires treating your agent like a distributed system: canary paths, automatic failover, continuous health checks. Most of the effort in production agents isn't "how do I give it the right info?" It's "how do I handle when things work 85% of the time instead of 95%?"
This comment instantly set off my LLM alarm bells. Went into the profile, and guess what: next comment (not a one-liner) [0] on a completely different topic was posted 35 seconds later. And includes the classic "aren't just A. They're B.".
Why are you doing this? Karma? 8 years old account and first post 3 days ago is a Show HN shilling your "AI agent" SaaS with a boatload of fake comments? [1]
Pinging tomhow
[0] https://news.ycombinator.com/item?id=46782579
Wow.
Kinda fucked we cant tell the difference anymore
Dude I am not AI. Real human. Just started on HN.
Just happen to post 2 comments within 30s on completely different posts, having all of the hallmarks of LLM output? With your other post being full of green accounts? With no account activity for 8 years? You're clearly posting comments straight from an LLM.
It's not realistic to read the other post to a significant degree, think about it, and then type all of this:
> The prompt injection concerns are valid, but I think there's a more fundamental issue: agents are non-deterministic systems that fail in ways that are hard to predict or debug. Security is one failure mode. But "agent did something subtly wrong that didn't trigger any errors" is another. And unlike a hacked system where you notice something's off, a flaky agent just... occasionally does the wrong thing. Sometimes it works. Sometimes it doesn't. Figuring out which case you're in requires building the same observability infrastructure you'd use for any unreliable distributed system.
> The people running these connected to their email or filesystem aren't just accepting prompt injection risk. They're accepting that their system will randomly succeed or fail at tasks depending on model performance that day, and they may not notice the failures until later.
Within 35 seconds of posting this one. And it just happens to have all LLM hallmarks there are. We both know it, you're on HN, people here aren't fools.
2 replies →
And the comments on my post are not fake (as far as I know). Some are legit users who I know personally.