← Back to context

Comment by pama

1 year ago

I am very familiar with these and other clustering methods in modern ML, and have been involved in inventing and publishing some such methods myself in various scientific contexts. The paper I cited above only used 3 nearest neighbors as one baseline IIRC; that is why I mentioned KNN. However, even boosted trees failed to reduce the loss as much as the algorithm learned from the data by the decoder transformer.