← Back to context

Comment by schoen

2 months ago

These are always amazing when juxtaposed with apparently impressive LLM reasoning, knowledge, and creativity. You can trivially get them to make the most basic mistakes about words and numbers, and double down on those mistakes, repeatedly explaining that they're totally correct.

Have any systems tried prompting LLMs with a warning like "You don't intuitively or automatically know many facts about words, spelling, or the structure or context of text, when considered as text; for example, you don't intuitively or automatically know how words or other texts are spelled, how many letters they contain, or what the result of applying some code, mechanical transformation, or substitution to a word or text is. Your natural guesses about these subjects are likely to be wrong as a result of how your training doesn't necessarily let you infer correct answers about them. If the content or structure of a word or text, or the result of using a transformation, code, or the like on a text, is a subject of conversation, or you are going to make a claim about it, always use a tool to confirm your intuitions."?

This is a great idea. Like, if someone asked me to count the number of B's in your paragraph, I'd yeet it through `grep -o 'B' file.txt | wc -l` or similar, why would I sit there counting it by hand?

As a human, if you give me a number on screen like 100000000, I can't be totally sure if that's 100 Million or 1 Billion without getting close and counting carefully. Should ought have my glasses. Mouse pointer helps some as an ersatz thousands-separator, but still.

Since we're giving them tools, especially for math, it makes way more sense to start giving them access to some of the finest tools ever. Make an MCP into Mathematica or Matlab and let the LLM write some math and have classical solvers actually deal with the results. Let the LLM write little bits of bash or python as its primary approach for dealing with these kinds of analytical questions.

It's like giving a kid a calculator...

  • If you have to build an MCP for every system you aren’t building intelligence in the first place.

    • Why does it matter? I don't care whether it's intelligent, I just need it to be useful. In order to be useful it needs to start fucking up less, stat. In current form it's borderline useless.

    • I think a piece of software that can correctly decide what oracle to consult to get answers to questions you give it can be called intelligent, even if it itself doesn’t know any facts.

    • Fair criticism, but also this arguably would be preferable. For many use cases it would be strictly better, as you've built some sort of automated drone that can do lots of work but without preferences and personality.

    • You don't need specialised MCPs for this. In the past you could add "use python" to there chatgpt prompt and it would do the right thing. This is exactly the intelligent "use the right tool for the right thing" idea. Chatgpt just want trained to apply it in the right circumstances automatically.

    • We have fingers and a keyboard / mouse, because that's the best thing we've come up with. If we could output binary to a bunch of program interfaces directly (perhaps via some neuralink type thing) we would surely take that option.

  • > As a human, if you give me a number on screen like 100000000, I can't be totally sure if that's 100 Million or 1 Billion without getting close and counting carefully.

    I become mildly infuriated when computers show metrics (or any large number) without thousands separators.

    Worse still, I often see systems that mix units, don’t right-align, and occasionally blend in a few numbers with decimals together with whole numbers! Then, update everything every second to make things extra spicy.

    • Right. So if we have these kinds of cognitive blindness and just accept it as status quo, and when necessary we make up for it with tooling... it's not unreasonable to assume that AI will also have to do this

You can’t just prompt your way out of a systemic flaw

  • You don't need to as long as you don't use LLMs like these in cases where incorrect output isn't of any consequence. If you're using LLMs to generate some placeholder bullshit to fill out a proof of concept website, you don't care if it claims strawberries have tails, you just need it to generate some vaguely coherent crap.

    For things where factuality is even just a little important, you need to treat these things like asking a toddler that got their hands on a thesaurus and an encyclopaedia (that's a few years out of date): go through everything it produces and fact check any statement it makes that you're not confident about already.

    Unfortunately, people seem to be mistaking LLMs for search engines more and more (no doubt thanks to attempts from LLM companies to make people think exactly that) so this will only get worse in the future. For now we can still catch these models out with simple examples, but as AI fuckups grow sparser, more people will think these things tell the actual truth.

> prompting LLMs with a warning like "You don't intuitively or automatically know many facts about...

We are not interested specifically in the inability to «know» about text: we are strongly interested in general in the ability to process ideas consciously, procedurally - and the inability to count suggests the general critical fault.

I often tell LLMs to ask questions if required, and that it is a skilled developer who is working along side me. That seems to help them be more collaborative rather than prescriptive.

I added something like that to my Claude project prompt and it can now magically solve fairly complex letter counting problems using the dashes method. For anything longer than a couple sentences, it's probably best to just have them write a REPL script.