Comment by OsrsNeedsf2P

1 year ago

This test has always been so stupid since models work at the token level. Claude 3.5 already 5xs your frontend dev speed but people still say "hurr durr it can't count strawberry" as if that's a useful problem

7 comments

OsrsNeedsf2P

dannyw 1 year ago

The problem also comes to LLMs being confidently wrong when it’s wrong.

elicksaur 1 year ago

“Already 5xs”

Even AI marketing doesn’t claim this. Totally baseless claim given how many people report negative experiences trying to use AI.

ssijak 1 year ago

Some people report some negative experiences for any tool ever brought into existence.

bufferoverflow 1 year ago

This test isn't stupid. If it can't count the number of letters in a text, can you rely on it with more important calculations?

stnmtn 1 year ago
You can rely on it for anything that you can validate quickly. And it turns out, there are a lot of problems which are trivial to validate the solution to, but difficult to build the solution.
- 101008 1 year ago
  
  Coding is not one of those cases or edge cases wouldn't exists
TeMPOraL 1 year ago

Not on calculations that involve counting at a sub-token level. Otherwise, it depends.