Comment by brig90
1 day ago
Totally fair — I defaulted to paraphrase-multilingual-MiniLM-L12-v2 mostly for speed and wide compatibility, but you’re right that it’s long in the tooth by today’s standards. I’d be really curious to see how something like all-mpnet-base-v2 or even text-embedding-ada-002 would behave, especially if we keep the suffixes in and lean into full contextual embeddings rather than reducing to root forms.
Appreciate you calling that out — that’s a great push toward iteration.
No comments yet
Contribute on Hacker News ↗