← Back to context

Comment by butlike

1 day ago

The path is different than the filename though. If I want to find duplicates, it will be impossible if the filename changes. In my use case

/User/user/Images/20240110/happy_birthday.jpg

and

/User/user/Desktop/happy_birthday.jpg

are the same image.

> it will be impossible if the filename changes.

Not impossible, just different and arguably better - comparing hashes is a better tool for finding duplicates.

  • From a technological standpoint, sure. I'd argue when you're staring down the barrel of 19,234 duplicate file deletions, with names like `image01.jpg`, `image02.jpg` instead of `happy_birthday.jpg`, there's a level of perceptual cognitive trust there that I just can't provide.

If your camera (or phone) uses the DCF standard [0], you will eventually end up with duplicates when you hit IMG_9999.JPG and it loops around to IMG_0001.JPG. Filename alone is an unreliable indicator.

[0]: https://en.wikipedia.org/wiki/Design_rule_for_Camera_File_sy...

  • > loops around to IMG_0001

    Almost all cameras create a new directory, e.g. DSC002, and start from IMG_0001 to prevent collision.

  • Which systems still use this shortsighted convention? All photos I’ve taken with the default camera app in the last many years are named with a timestamp.

> If I want to find duplicates, it will be impossible if the filename changes.

Depends on what is meant by a "duplicate." It would be a good idea to get a checksum of the file, which can detect exact data duplicates, but not something where metadata is removed or if the image was rescaled. Perceptual hashing is more expensive but is better distinguish matches between rescaled or cropped images.

https://en.wikipedia.org/wiki/Perceptual_hashing