Comment by mtrovo

7 hours ago

I think the main issue is treating LLM as a unrestrained black box, there's a reason nobody outside tech trust so blindly on LLMs.

The only way to make LLMs useful for now is to restrain their hallucinations as much as possible with evals, and these evals need to be very clear about what are the goal you're optimizing for.

See karpathy's work on the autoresearch agent and how it carry experiments, it might be useful for what you're doing.

8 comments

mtrovo

riffraff 7 hours ago

> there's a reason nobody outside tech trust so blindly on LLMs.

Man, I wish this was true. I know a bunch of non tech people who just trusts random shit that chatgpt made up.

I had an architect tell me "ask chatgpt" when I asked her the difference between two industrial standard measures :)

We had politicians share LLM crap, researchers doing papers with hallucinated citations..

It's not just tech people.

withinboredom 5 hours ago
We were working on translations for Arabic and in the spec it said to use "Arabic numerals" for numbers. Our PM said that "according to ChatGPT that means we need to use Arabic script numbers, not Arabic numerals".
It took a lot of back-and-forths with her to convince her that the numbers she uses every day are "Arabic numerals". Even the author of the spec could barely convince her -- it took a meeting with the Arabic translators (several different ones) to finally do it. Think about that for a minute. People won't believe subject matter experts over an LLM.
We're cooked.
- ThrowawayR2 34 minutes ago
  
  Kind of a tangent but that did make me curious about how numbers are written in Arabic: https://en.wikipedia.org/wiki/Eastern_Arabic_numerals
- tstenner 3 hours ago
  
  The architect should have required Hindu numbers. Same result, but even more confusion.
- dvfjsdhgfv 3 hours ago
  
  Man this is maddening.
roncesvalles 4 hours ago

And the worst part is, these people don't even use the flagship thinking models, they use the default fast ones.

closewith 4 hours ago

In my experience, people outside of tech have nearly limitless faith in AI, to the point that when it clashes with traditional sources of truth, people start to question them rather than the LLM.