Comment by logicallee
2 years ago
I took a quick glance through the article. It states:
"LLMs can’t count or easily do other mathematical operations due to tokenization, and because tokens correspond to a varying length of characters, the model can’t use the amount of generated tokens it has done so far as a consistent hint."
It then proceeds to use this thing that current LLM's can't do to see if it responds to tipping.
I think that is frankly unfair. It would be like picking something a human can't do, and then using that as the standard to judge whether humans do better when offered a tip.
I definitely think the proper way to test whether tipping improves performance is through some metric that is definitely within the capabilities of LLM's.
Pick something they can do.
No comments yet
Contribute on Hacker News ↗