Comment by lutusp

3 days ago

On the topic of low-quality investments, there should be a posting rule that HN submissions consist of text, not pictures of text, so readers can search for additional information by copying text, not images.

Nothing is more frustrating than an image-based PDF masquerading as text, especially now that, with little time or effort, OCR can convert most images into text documents.

It's not my site but I'm the one who posted it here. The nice thing about a scan (if it's good quality) for an old article like this is it viscerally provides some context (eg: the era and type of publication). The scan quality here is abysmal, and for that I do apologize.

  • as someone blind, there is one more reason to consider image based pdf-s undesirable. OCR works okayish most of the time, but it tends to suck when it does not and if the layout of the scanned page is complex, the output often breaks down. Not to mention that the friction of converting it is a disincentive to bother.

This image-based PDF made my morning. I wish journalists would write like this these days, as opposed to jamming their reports through an LLM.

I would rather something interesting without OCR is posted that not at all, especially if an OCR version is not available.