Comment by theshrike79

2 months ago

The easiest way to fix these is give the model an environment to run code.

Any model can easily one-shot a python script that can count the occurrence of any letter anywhere and return the result.

It's just a tooling issue. You really can't "train" an LLM to do it because tokenisation and ... stuff.

2 comments

theshrike79

0cf8612b2e1e 2 months ago

I am not convinced they are executing code. Otherwise I would expect LLMs to not frequently guess the result of math questions.

Of course you could train it. Some quick scripting to find all words with repeat letters, build up sample sentences (aardvark has three a,) and you have hard coded the answer to simple questions that make your LLM look stupid.

theshrike79 2 months ago

I have personally observed Grok running Python code in a chat to determine the current date so it could accurately tell me whether the 20th is a Friday (it wasn't in that specific month)
.. it did that in a story prompt that didn't happen in a) our world b) the current time =)