Comment by MountDoom
13 hours ago
> Nothing new to see here.
Eh, I don't think that's a productive thing to say. There's an immense business pressure to deploy LLMs in such decision-making contexts, from customer support, to HR, to content policing, to real policing. Further, because LLMs are improving quickly, there is a temptation to assume that maybe the worst is behind us and that models don't make too many mistakes anymore.
This applies to HN folks too: every other person here is building something in this space. So publicizing failures like this is important, and it's important to keep doing it over and over again so that you can't just say "oh, that was a 3o problem, our current models don't do that".
I completely agree with you. GP’s cynical take is an upvote magnet but doesn’t contribute to the discourse.