← Back to context

Comment by whatevertrevor

5 days ago

As a user I'm worried about a + b sure. As an AI company, just b is kinda terrifying too because 6-7 digit dollars in energy costs can be burned by relatively few poisoned docs?

Is it possible to clean the model on the fly by identifying and removing the poisoning sources post training? Or do you have to start from scratch?

  > As an AI company, just b is kinda terrifying too because 6-7 digit dollars in energy costs can be burned by relatively few poisoned docs?

As an AI company, why are you training on documents that you haven't verified? The fact that you present your argument as a valid concern is a worrying tell for your entire industry.

  • Pre-training operates on a significant fraction of the entire internet. It’s simply not possible.

  • > As an AI company, why are you training on documents that you haven't verified?

    Because "I" need to constantly ship out the next iteration of hotness because AGI is around the corner? Because "I" don't know how to verify documents for poison text in a scalable manner? Because "I" don't care? I am not an AI company, how would I know?

    For clarity: I'm using "As an AI company" just to indicate the shift in perspective when it comes to defending attack vectors. Not literally indicating that I am (or affiliated with) an AI company.

    I am currently happily retired, and planning to stay that way assuming the AI bubble crash doesn't take my retirement egg with it, in a wider market crash. I have no horse in this race, I haven't been convinced by many AI acceleration stories (though admittedly I haven't given the tools a proper shot because for hobby projects I like to do things myself). And it's definitely not my (entire) industry. So completely wrong read on many levels there, friend.