Comment by cpursley 20 hours ago How are you prepping the PDF data before shoving it into Qwen? 3 comments cpursley Reply Alifatisk 20 hours ago I just compress the file size as low as possible without losing the quality, didn't even know there was more ways to prep it.I do sometimes chop up the PDF into smaller pdfs with their own individual chapters amelius 20 hours ago On Linux you can use pdftotext also if you are only concerned with the text. navbaker 20 hours ago Not OP, but we use the docling library to extract text and put it in markdown before storing for use with an LLM.
Alifatisk 20 hours ago I just compress the file size as low as possible without losing the quality, didn't even know there was more ways to prep it.I do sometimes chop up the PDF into smaller pdfs with their own individual chapters amelius 20 hours ago On Linux you can use pdftotext also if you are only concerned with the text.
navbaker 20 hours ago Not OP, but we use the docling library to extract text and put it in markdown before storing for use with an LLM.
I just compress the file size as low as possible without losing the quality, didn't even know there was more ways to prep it.
I do sometimes chop up the PDF into smaller pdfs with their own individual chapters
On Linux you can use pdftotext also if you are only concerned with the text.
Not OP, but we use the docling library to extract text and put it in markdown before storing for use with an LLM.