Comment by MontagFTB

3 years ago

So the bug where the first voting sheet shown to a user was from the same 10% of the photos turned out to be a feature, serving as a CAPTCHA of sorts to weed out the bad actors from the good.

If memory serves, some CAPTCHA techniques include showing two numbers to transcribe, where one’s value is already known. If that number is transcribed incorrectly, then the other number’s result isn’t used, and the CAPTCHA fails. Perhaps a similar technique may have also helped here?

9 comments

MontagFTB

Spare_account 3 years ago

This approach was part of their strategy:

>Then we started showing some results we knew to the bots - if they entered wrong numbers, we would stop accepting the results.

didgetmaster 3 years ago
It seems to me that when combating bots or hackers, the wrong approach is to provide immediate negative feedback. Giving an immediate error code lets them know that their current strategy is not working and to try something different.
It seems like a better approach would be to make them think you were accepting the results, when in fact they were going to the bit bucket. Hackers trying to get into your corporate database should be presented with a table full of false (but plausible) data rather than an error. Let them waste time trying to use all those fake SS numbers or account numbers before they figure out they got duped.
- tetha 3 years ago
  
  As scary as it can be, but yes. It's similar to strategy games at a point - sometimes it's better to let the enemy push you around for a bit as long as nothing important is damaged. I don't really care if I have to scale up the LBs a bit to handle all of the requests for some time. However, this allows your attacker to commit more of their resources, so you can block and ban more once you react or so you can learn more about their behavior, so you can mislead, slow-lorry and generally mess with them more effectively.
  There have also been funny defcon-talks about messing with attackers about this, by returning all kinds of messed up return codes, slow-lorry'ing the bot, ... I'm kind of wondering if you could SSRF (or rather, CSRF) a bot like this by returning a redirect to e.g. the AWS metadata API... could be a fun topic to mess with.
- theptip 3 years ago
  
  For sure, shadow-banning is a great strat here. Raise their costs, and don’t give them any signal to learn from.
  Assuming you have the bandwidth to absorb the bot load, which sounded like it was an issue here.
- pbhjpbhj 3 years ago
  
  It's also evidence of a crime. I wonder how that relates: if you just drop those entries from the database (or from the app prior to entry into the main db) then that seems like destruction of evidence of a crime?
  It seems one should record all entries, but only update a canonical db if all entries fail to trip automated tampering detections.

dan-robertson 3 years ago

I think the bug was that your first sheet came from a small set and the people entering bad data would refresh instead of doing the actually random next sheet, so entries for most of the sheets came only from people who had long sessions who were apparently more likely to enter good data.

malborodog 3 years ago

Can you explain that again differently? I didn’t understand that captcha point. It feels important though.

czx4f4bd 3 years ago

I think they're referring to the old reCAPTCHA v1 approach.
From https://en.wikipedia.org/wiki/ReCAPTCHA:
> The original iteration of the service was a mass collaboration platform designed for the digitization of books, particularly those that were too illegible to be scanned by computers. The verification prompts utilized pairs of words from scanned pages, with one known word used as a control for verification, and the second used to crowdsource the reading of an uncertain word.
wodenokoto 3 years ago

Original captcha was built around transcribing text that ocr tools failed at
So I give you two words to transcribe to prove you are human. I know one of them and I want to know the other.