Comment by leobg

1 year ago

There are AI bros that will call an LLM to do what you could do with a regex. I’ve seen people do the chunking for RAG using an LLM…

3 comments

leobg

Reply

tossandthrow 1 year ago

If you think about chunking as "take x characters" then using LLMs is a poor idea.

But syntactic chunking also works really poorly for any serious application as you loose basically all context.

Semantic chunking, however, is a task you absolutely would use LLMs for.

leobg 1 year ago
If by LLM you mean embeddings I agree. Though you can often get away with using much smaller models for that.
I was talking about people who actually make a call to a completion endpoint and then have the LLM repeat the input text token for token just to get the split.
- tossandthrow 1 year ago
  
  How do you do semantic chunking using embeddings?
  And yes, I perfectly now what you are talking about. And yes, that is a perfect strategy to chunk large texts so you can index it.
  It does not sound like you are familiar with chunking and it's current issues?