Comment by avereveard

1 year ago

yeah there are only three stories coming up from the site search and none picks up things like citus etc

https://hn.algolia.com/?q=postgres+clustering

only one is semanthically correct, the other pick up the wrong version of clustering (i.e. k-means instead of multi master writes)

but yeah if one doesn't test the hard cases, how does one know it preserves semantics :D

6 comments

avereveard

jnnnthnn 1 year ago

In fairness, it's probably impossible to unambiguously determine what the intended/desired interpretation is (though intuitively it seems like k-means should be lower likelihood)!

avereveard 1 year ago
I've tried Hyde and seems to work better. had to do it client side tho. asked chatgpt: "write one sentence explanation about this topic: solutions for postgres clustering" which returned "Solutions for PostgreSQL clustering involve implementing methods such as streaming replication or third-party tools like Patroni to manage and distribute database workloads across multiple servers for enhanced performance and fault tolerance." then I searched that:
https://hackersearch.net/search?q=Solutions+for+PostgreSQL+c...
and results are much better:
1. An overview of distributed Postgres architectures 2. A Technical Dive into PostgreSQL's replication mechanisms 3. Ways to capture changes in Postgres
hyde paper is here https://arxiv.org/abs/2212.10496
it's possible that openai embedding are simmetrical, if that the case you need to hallucinate some content and use that as base for the embedding distance calculation. or you can move to asymmetric embedding, or you can try prompting their embedding
edit: prompting embedding seems to work, tried searching for “write an article about: solutions for postgres clustering” and results are much better https://hackersearch.net/search?q=write+an+article+about%3A+...
you can try prepending "write an article about: " to all user searches :D
- jnnnthnn 1 year ago
  
  Sweet! Thanks for sharing. A prior implementation had HyDE running on user searches, but I found the results to be hit-or-miss depending on the query type.
  I definitely want to re-explore that though; I think it should be possible to do so a lot more rigorously now that I have a better sense for what people want to search for.
  
  3 replies →