Comment by 0cf8612b2e1e
1 day ago
I am not convinced they are executing code. Otherwise I would expect LLMs to not frequently guess the result of math questions.
Of course you could train it. Some quick scripting to find all words with repeat letters, build up sample sentences (aardvark has three a,) and you have hard coded the answer to simple questions that make your LLM look stupid.
I have personally observed Grok running Python code in a chat to determine the current date so it could accurately tell me whether the 20th is a Friday (it wasn't in that specific month)
.. it did that in a story prompt that didn't happen in a) our world b) the current time =)