Comment by whatevertrevor

5 days ago

As a user I'm worried about a + b sure. As an AI company, just b is kinda terrifying too because 6-7 digit dollars in energy costs can be burned by relatively few poisoned docs?

Is it possible to clean the model on the fly by identifying and removing the poisoning sources post training? Or do you have to start from scratch?

10 comments

whatevertrevor

dotancohen 4 days ago

  > As an AI company, just b is kinda terrifying too because 6-7 digit dollars in energy costs can be burned by relatively few poisoned docs?

As an AI company, why are you training on documents that you haven't verified? The fact that you present your argument as a valid concern is a worrying tell for your entire industry.

bix6 4 days ago
AI companies gave up on verification years ago. It’s impossible to verify such intense scraping.
- vrighter 4 days ago
  
  not really our problem though is it?
  
  4 replies →
theptip 3 days ago

Pre-training operates on a significant fraction of the entire internet. It’s simply not possible.
whatevertrevor 4 days ago

> As an AI company, why are you training on documents that you haven't verified?
Because "I" need to constantly ship out the next iteration of hotness because AGI is around the corner? Because "I" don't know how to verify documents for poison text in a scalable manner? Because "I" don't care? I am not an AI company, how would I know?
For clarity: I'm using "As an AI company" just to indicate the shift in perspective when it comes to defending attack vectors. Not literally indicating that I am (or affiliated with) an AI company.
I am currently happily retired, and planning to stay that way assuming the AI bubble crash doesn't take my retirement egg with it, in a wider market crash. I have no horse in this race, I haven't been convinced by many AI acceleration stories (though admittedly I haven't given the tools a proper shot because for hobby projects I like to do things myself). And it's definitely not my (entire) industry. So completely wrong read on many levels there, friend.