← Back to context

Comment by Reubend

2 days ago

It's a valid criticism that this method would increase compute requirements, but sometimes an improvement in the end result justifies the compute needed. For things like code generation in large datasets, many people would be willing to "pay" with more compute if the results were better. And this doesn't seem to require more memory bandwidth, so it could be particularly good for local models.