Comment by WA
5 years ago
Shows the bias in machine learning. One simple parameter isn't added and the whole model is bullshit.
One parameter would be: Amount of money this customer has spend on our products.
Another would be: Active time since signup.
I'm pretty sure if "money spend > 0" is actually a legitimate threshold to remove a lot of spam, although not all. "money spend > 200" might to the trick though.
Forget ML, this is just business process mapping. If it's a payer-customer's account, issues should be sent to a human. Payer-customers should have access to a secondary channel (read: alternate phone number). Payer-customers Google contact(s) should be notified & included in the process.
As a general rule of thumb, if Google is struggling with a problem, it's not a tech problem.
This can be gamed. There are so many stolen credit card numbers and/or payments using Apple/Google pre-paid cards out there, so it's not difficult to automatically build accounts with this kind of 'reputation'.
Unfortunately the best way to do KYC is (still) human intervention (and use of data).
It is significantly harder to game though - companies succesfully offer behavioral monitoring for DLP products with far less data than the payment data Google has access to. Years of payments with a certain payment type? That's a pattern. Renting movies at certain time in the week? That's another... The truth of the matter is, somebody has to actually care to do this. From accounts of googlers I've read, that's not what the culture of Google is likely to result in though.
It can be gamed. But if the average value of a fake account is $100 and you set the threshold to be $200 it is no longer profitable.
Of course this still isn't a perfect metric. But it seems that banning people with accounts that have spend thousands of dollars and been active for many years should probably be avoided and this will significantly help that.
I mean if the account has spent >$50 you can probably afford a human review at the very least.