Comment by RjQoLCOSwiIKfpm

3 years ago

What boggles me about this is:

I do NOT consent to working for free for Google to train their AI.

I'd be willing to solve any CAPTCHA the product of which would be open source, or even useless.

But Google is a for-profit company which uses the solutions to create proprietary software and profit off of it, they won't pay me, and I have no way to opt-out of working for them because the most useful places of the Internet use their CAPTCHAs.

(Yes, I can intentionally put wrong solutions into their CAPTCHAs to poison their data, but I'm afraid they get so many valid solutions that they can just calculate the wrong ones out.)

I'm pretty sure Google's AI has already reached the information theoretic limits for recognizing fire hydrants etc. so you're not really training it anymore

What bothers me about recaptcha (other than the obvious first order task) is that I believe it's used to penalize people who don't let google track them, and by extension to make other browsers look worse. It's an abuse of their market power.

  • I am not sure that's the "intent" but it sure is a suspiciously advantageous (for them) side effect.

    Like how I gave up using protonmail because my emails kept getting classified as spam by anyone using gmail or gmail-backed organizational email.

    • I'm on the opposite spectrum: I think the intent to collect data is at least above 50%, meaning that gathering information on individuals visiting third-party platforms has taken precedence over training their model.

      Also, I think we shouldn't underestimate the monetization value of being able to target "HN users" for advertising. From the moment we are flagged, Google can exploit this data pointer for targeted advertisment on any other website/app.

      This information should be given at the highest cost possible:)

There is basically no chance the captchas are actually being used for generating training data at this point. The puzzles have not changed for ages. Like, five years? How many billions (trillions?) of labels do you think they have for buses and traffic lights at this point?

If there was an economic value to using captcha solutions for labeling, somebody would be rotating novel tasks into the mix. But they don't seem to be.

(And if the goal of running the service was to generate labels, they would not have built solutions to make it possible to pass the captchas without a puzzle, like recaptcha v3.)

So rest assured, your work in solving the captchas is totally b useless, just like you wanted!

  • Lately I've seen captchas that ask me to identify things in images that are clearly generated with AI. Like frogs without backs.

    I think at this point it is clear we are not training image recognition so much as providing them with free scoring for their image generation algorithms.

    • Just to be clear, are you saying that you saw the "frogs without backs" puzzle on Recaptcha? Because I definitely have not seen anything but the streetview images there for ages.

      Now, if it is a captcha provider whose advertised business model is to sell access to the users for labeling and split the profits with the website that integrates their captcha (e.g. hCaptcha), then I can believe somebody would submit a image generation eval dataset. But it seems irrelevant to discussion of whether solving a Recaptcha is free work.

      1 reply →

  • I would rather guess that the emperor has no clothes, i.e. AI is still so bad that it needs insane amounts of training data and hasn't got enough yet.

    • That's a fun guess? Do you flip a coin every time you make a decision or...

      I can assure you that if CLIP and ALIGN exist, there is objectively no reason for them to collect what would amount to a dataset for...solving Google CAPTCHA's? Which I'm pretty sure is a solved problem even without the data.

  • It is just continuous QA for Waymo. Measures how well existing ML is working in the real world.

    • That's a great example of something that would require them mixing in novel tasks, rather than recycle the stale traffic light detection puzzles. Because let's be honest, detecting traffic lights is not anywhere closest to the hard part about self-driving cars. Knowing how well you can do it tells you nothing about how well you can solve the actual difficult things.

      What would a task that uses humans to solve that problem actually look like? I'm guessing it would need to be short videos, not images. And look for things with some ambiguity. "Select any videos where a pedestrian looks like they intend to cross the street".

      1 reply →

> I do NOT consent to working for free for Google to train their AI.

Its not for free.

In this case, you get access to HN when it is under attack.

If you don’t consent to those terms, that's your choice, you can wait and come back later.

  • Or we can complain, suggest alternatives, and hope that it motivates a change. Hacker News is, after all, a place for conversation—people are entitled to express an opinion.

    • Sure, my point is not “don't complain”, but “the current HN usage of CAPTCHA neither compels you to use it without consent, nor proposes that you train Google’s AI for free”.

      That is, it is about the specific content of the complaint, not a meta-level commentary on the appropriateness of complaining about practices you disapprove of.

That's an interesting take. Everything costs money. You know the reason why the CAPTCHA service is free is because they have value in the results of the CAPTCHA, right? You're not viewing ads or paying for this service. I'd prefer not to help Google either, but nothing is truly free.

  • Do you know how ReCAPTCHA started? Digitalizing old analog books. Probably just as commercial, but it feels better than training a ML algorithm for an international conglomerate.

If your goal is to avoid Google using your data, putting in bad data that is filtered out accomplishes that, right?

I don't personally have an opinion on HN using the captcha, but their reasoning is pretty obvious, and almost certainly comes from a good place (reducing any spam on the site). That said, you're welcome to your opinions, it just seems like you have an option, based on your stated goal.

  • Even if it does accomplish that they will still have coaxed me into doing work for them even though I'm not consenting to working for Google.

    Consider it like this:

    If someone forced you to do physical work against your will, you wouldn't like it any more just because they throw away the product of your labor in the end.

    It would just make it more obscene.

If you don't consent to it then don't fill it out. Plus CAPTCHAs haven't been used for ML training for many years now.

This just shows how little consent-based ethics matter (they break down immediately when the other party simply defects).