← Back to context

Comment by cpursley

1 day ago

How are you prepping the PDF data before shoving it into Qwen?

I just compress the file size as low as possible without losing the quality, didn't even know there was more ways to prep it.

I do sometimes chop up the PDF into smaller pdfs with their own individual chapters

  • On Linux you can use pdftotext also if you are only concerned with the text.

Not OP, but we use the docling library to extract text and put it in markdown before storing for use with an LLM.