Comment by Havoc

4 months ago

Crazy writeup.

Author is right about the base64 part. Does seem weird that it can decode and understand it at same time. And I guess what makes it weird that we just sorta accept that for say English and German this works ie normal use but when framed as base64 then it suddenly stops feeling intuitive

11 comments

Havoc

dinobones 4 months ago

why tho? it's just an alternate alphabet/set of symbols.

dnhkng 4 months ago
Because its generally expected that models only work 'in distribution', i.e. they work on stuff they have previously seen.
They almost certainly have never seen regular conversations in Base64 in their training set, so its weird that it 'just works'.
Does that make sense?
- fweimer 4 months ago
  
  If you do not properly MIME-decode email, you end up with at least some base64-encoded conversations.
- dormento 4 months ago
  
  For all we know, AI tech companies could theoretically have converted all of the "acquired" (ahem!) training set material into base64 and used it for training as well, just like you would encode say japanese romaji or hebrew written in the english alphabet.
  
  3 replies →
- gwern 4 months ago
  
  > They almost certainly have never seen regular conversations in Base64 in their training set, so its weird that it 'just works'.
  People use Base64 to store payloads of many arbitrary things, including web pages or screenshots, both deliberately and erroneously, and so they have almost certainly seen regular conversations in Base64 in their 10tb+ text training sets scraped from billions of web pages and files and mangled emails etc.
  
  1 reply →
- broDogNRG 4 months ago
  
  [dead]