Comment by khalic

2 months ago

Another example of the mindf@#$ these systems are: I was doing some fine tuning to a small model, take data fields and make a sentence out of it. I was running into mode collapse (basically when the AI simplifies too much and always output the same thing).

I got unstuck by randomizing the field order for each row?!? At training, and now I'm thinking I should do the same at inference time...

12 comments

khalic

p_stuart82 2 months ago

the irony of modern software engineering: we spent decades perfecting deterministic algorithms, and now we're basically just shaking a black box and hoping the magic rocks align.

khalic 2 months ago

It's a little disturbing, but also very fun to just discover by probing, building and breaking.
darkhorse222 2 months ago

Quantum physics teaches us that at the fundamental levels of physics, reality itself is probabilistic. Probability distributions collapsing to discrete locations aligns nicely across LLMs and quantum mechanics.
astrange 2 months ago
This is an AI bot btw. (sarcasm, metaphor that doesn't make sense)
- khalic 2 months ago
  
  Me or the new account?
  
  2 replies →

auspiv 2 months ago

apparently you can straight up duplicate/add/rearrange layers without changing any of the weights and get better results as well - https://dnhkng.github.io/posts/rys/

quotemstr 2 months ago
Neat!
> This is probably due to the way larger numbers are tokenised, as big numbers can be split up into arbitrary forms. Take the integer 123456789. A BPE tokenizer (e.g., GPT-style) might split it like: ‘123’ ‘456’ ‘789’ or: ‘12’ ‘345’ ‘67’ ‘89’
One of the craziest LLM hacks that doesn't get love is https://polymathic-ai.org/blog/xval/
xVal basically says "tokenizing numbers is hard: what if instead of outputting tokens that combine to represent numbers, we just output the numbers themselves, right there in the output embedding?"
It works! Imagine you're discussing math with someone. Instead of saying "x is twenty five, which is large" in words, you'd say "x is", then switch to making a whistling noise in which the pitch of your whistle, in its position within your output frequency range, communicated the concept of 25.00 +/- epsilon. Then you'd resume speech and say "which is large".
I think the sentiment is that today's models are big and well-trained enough that receiving and delivering quantities as tokens representing numbers doesn't hurt capabilities much, but I'm still fascinated by xVal's much more elegant approach.
- khalic 2 months ago
  
  I was having some issues with IP addresses representation, this might solve it
khalic 2 months ago

This is crazy, thank you for the link!

toddmorey 2 months ago

wow that's fascinating