Comment by refulgentis

6 days ago

I guess I'd say "mu", from a dev perspective, you shouldn't care about tokens ever - if your inference framework isn't abstracting that for you, your first task would be to patch it to do so.

To parent, yes this is for local models, so insomuch worrying about token implies financial cost, yes

6 comments

refulgentis

IanCal 6 days ago

Ish - it always depends how deep in the weeds you need to get. Tokenisation impacts performance, both speed and results, so details can be important.

refulgentis 6 days ago
I maintain a llama.cpp wrapper, on everything from web to Android and cannot quite wrap my mind around if you'd have any more info by getting individual token IDs from the API, beyond what you'd get from wall clock time and checking their vocab.
- lqstuart 6 days ago
  
  I don’t really see a need for token IDs alone, but you absolutely need per-token logprob vectors if you’re trying to do constrained decoding
  
  2 replies →
- IanCal 5 days ago
  
  Do we have the vocab? That's part of the point here. Does it take images? How are they tokenised?