Comment by maronato

7 months ago

Huggingface has a few datasets of Wikipedia embeddings.

Here’s a few results: https://huggingface.co/search/full-text?q=Wikipedia+embeddin...

And the first result, which is probably what you’ll want to use: https://huggingface.co/datasets/Upstash/wikipedia-2024-06-bg...

I recommend you go for pgvector or a similar self hosted solution to calculate the similarities instead of a service like Vector.