← Back to context

Comment by vessenes

4 months ago

Arthur, question from your GitHub + essay:

In GitHub you show stats that say a "cache hit" is 200ms and a miss is 1-2s (LLM call).

I don't think I understand how you get a cache hit off a novel tweet. My understanding is that you

1) get a snake case category from an LLM

2) embed that category

3) check if it's close to something else in the embedding space via cosine similarity

4) if it is, replace og label with the closest in embedding space

5) if not, store it

Is that the right sequence? If it is, it looks to me like all paths start with an LLM, and therefore are not likely to be <200ms. Do I have the sequence right?

OP here. We embed both the label AND the tweet. So if tweet A is "I love burgers" and tweet B is "I love cheeseburgers", we ask in our vector DB if we have seen a tweet before that is very similar to B? If yes, we skip LLM altogether (cache hit) and just take the class label that A has.

From what I understood, we check whether "snake case category" from step (1) is already known to us (in the cache) so that we need no further processing. So that step (2) and further don't apply for categories that were already produced earlier.