Comment by maronato
7 months ago
Huggingface has a few datasets of Wikipedia embeddings.
Here’s a few results: https://huggingface.co/search/full-text?q=Wikipedia+embeddin...
And the first result, which is probably what you’ll want to use: https://huggingface.co/datasets/Upstash/wikipedia-2024-06-bg...
I recommend you go for pgvector or a similar self hosted solution to calculate the similarities instead of a service like Vector.
No comments yet
Contribute on Hacker News ↗