← Back to context

Comment by rovr138

12 hours ago

Everything has issues reading the content of PDFs natively. It's a format for displaying/rendering. Not for storing format in a way that's easy to parse for the text/content inside.

Is this one storing text or storing coordinates for where to draw a line for the letter 'l'? Is that an 'l' or a line?

The best way to do this is rendering it to an image and using the image. Either through models that can directly work with the image or OCR'ing the image.