Comment by cpursley

5 months ago

How are you prepping the PDF data before shoving it into Qwen?

3 comments

cpursley

I just compress the file size as low as possible without losing the quality, didn't even know there was more ways to prep it.

I do sometimes chop up the PDF into smaller pdfs with their own individual chapters

amelius 5 months ago

On Linux you can use pdftotext also if you are only concerned with the text.

Not OP, but we use the docling library to extract text and put it in markdown before storing for use with an LLM.