Comment by 9dev

1 day ago

Wrote a parser to extract image metadata once, and got massively frustrated with the amount of undocumented, semi-documented, wrongly documented, or partially documented attributes. You’ll find references online, but most of them lack half of what you encounter in images. Every image processing app under the sun adds its own range. Some use metric values, some imperial; finding out which can be guesswork. Aperture is given in f-stops, decimals, or literal fraction strings. Some attributes hold sentinel values. Some vendors have custom conventions for undefined data.

It’s a jungle out there.

11 comments

9dev

pchm 13 hours ago

Yes. I run a niche webapp[1] that extracts exif and xmp (Lightroom edits) from images. At one point I tried to write my own exif parser. It's not that complicated, but very quickly you'll run into weird legacy, vendor-specific nuances (apart from what the parent mentioned, you have to handle both big & little endian exif). And the long tail of those edge cases is, well, very long. Exiftool handles pretty much all of that.

[1] https://pixelpeeper.com/

9dev 10 hours ago

Pixelpeeper looks amazing, by the way. Such a great idea, thank you for sharing.

charles_f 1 day ago

I work on the receiving end of media processing nowadays, and the overlap of variety in formats, codecs, and configurations is frustrating. No two encoders work the same way, and they often "innovate" in fun and varied ways that almost feel like renewed attempts to make decoders crash.

ivanjermakov 21 hours ago

Sounds like a worse version of non-standard (X-*) HTTP headers: https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/...

Happens a lot when standard is not specific enough.

JKCalhoun 20 hours ago

Apple engineers partnered with Sony, Canon, Nokia, Adobe, Microsoft to try and hammer out a specification that addresses when different metadata flavors overlap (EXIF, IPTC, XMP) [1]. It was not comprehensive (to be sure) but covered the most common properties.

As was more or less discussed in the post, EXIF concerns itself more or less with hardware and camera attributes and can be considered authoritative for that domain.

IPTC was added by photojournalism to cover more artistic metadata like a caption, author (photographer), keywords, etc.

Like the one XKCD strip, XMP seems to have come along to try and create a 3rd standard to replace the first two…

[1] https://en.wikipedia.org/wiki/Metadata_Working_Group

linzhangrun 10 hours ago

production-grade interpreters are always full of dirty works.

tompark 18 hours ago

A few years ago I wrote an exif parser too, solely for reading/editing text comments, which is much simpler than what you did. Even then, yes, it's not pretty, very frustrating. There are multiple places to put text in exif, and it took a while to find most (all?) the edge cases.

But now it's quite different with LLMs. I recently updated my code and Claude had useful recommendations.

sherr 1 day ago

My hell was trying too make sense or and organise audio/music ID3 tags. What a nightmare that is. EXIF seems much nicer to me.

anjel 17 hours ago
Picard is far from perfect, but it does more or less impose a semblance of uniformity across a large library.
- sherr 13 hours ago
  
  Yes, that's what I ended up using. I was wanting to program something myself but it was too difficult. MusicBrainz Picard [1] is excellent.
  [1] https://picard.musicbrainz.org/

deathbyzen 1 day ago

that sounds endlessly frustrating