Comment by heavyset_go

6 months ago

You don't have to use ML models for this.

Can you elaborate more? Discord has 656m users. if 10% upload their ID, they'd have 65m ID photos to search through. There are 2 use-cases here:

1/ Safety Bans (lets pretend 0.01% of ID card users have been banned for safety reasons: 650k accounts)

If a user submits their selfie/ID card, Discord needs to compare the new image with one of the 650k banned (but deleted?) images. I can't possible think how a human could remember the 650k photos well enough to declare a match.

Even if such a human existed with this perfect recall, there can't be very many of them on this planet to hire.

2/ Duplicate account bans

If a user registers, how can a support staff search the 65m photos without ML assistance to determine if this is a new user or a fraudster?

  • 0.01% of 65M is 6,500. Also apparently only 70K people uploaded their IDs.

    That being said, you can still hash faces and metadata (such as ID numbers) instead of storing the whole ID as a scanned photo, if the information is only used for duplicate checking. Hashing does not increase the racial bias. If your model has a bias it will always have a margin of error.

    • neat, but how do users appeal a false positive? Do companies just trust the users or should the company retain the original information so they can manually verify?

      2 replies →

  • If they can't handle that many users then they should close signups.

    The product scales, but sfaely using users' data doesn't? Hardly an excuse.

  • Do you understand how image hashing works? You don't need machine learning just to check if two images are potentially identical.

    • Face hashing is different than generic image hashing. Methods like dividing the photo into smaller rectangles and storing the average colour for each rectangle won't work.

      It should be able to detect and hash facial features so that it can compare it to a future (potentially taken from a different angle) photo of the same person. You need some type of machine learning algorithm.

      1 reply →

    • yes, I've worked on face recognition databases with 150m and 40m faces for banking and safety.

      The models are not perfect. Humans should still be in the loop to verify, especially when the consequences of being wrong really suck for the user: losing access to their bank account, getting fired from their job.

      If you're referring to algorithms like phash (Where they are using the same core image, but just add a filter), they wont work well, because everyone's ID card mostly looks the same. There will be too many FPs.

      1 reply →