← Back to context

Comment by pogue

9 days ago

I've been looking for a project that would have an easy free/extremely cheap way to do OCR/image recognition for generating ALT text automatically for social media. Some sort of embedded implementation that looks at an image and is either able to transcribe the text, or (preferably) transcribe the text AND do some brief image recognition.

I generally do this manually with Claude and it's able to do it lightning fast, but a small dev making a third party Bluesky/Mastodon/etc client doesn't have the resources to pay for an AI API.

Such an approach moves the cost of accessibility to each user individually. It is not bad as a fallback mechanism, but I hope that those who publish won't decide that AI absolves them of the need to post accessible content. After all, if they generate the alt text on their side, they can do it only once and it would be accessible to everyone while saving multiple executions of the same recognition task on the other end. Additionally, they have more control how the image would be interpreted and I hope that this really would matter.