Comment by halJordan

1 day ago

I love how HN is loving this idea when it's the exact same thing Anthropic and OpenAi (and every other llm maker) did.

It's God's gift to them when it lets them bypass ads and dl copyrighted material. But it's Satan's curse on humanity when the Zuck does it to train his llm and dl copyrighted material.

12 comments

halJordan

deaux 21 hours ago

Both scale and purpose make them completely different things. You're acting as if they're the same when they're not.

eipi10_hn 20 hours ago

I won't comment about dl but ads are trackers and spyware for me. I don't spy on websites' owners, I have my human rights to stop those trackers.

Zuck serves ads/spywares to other users, he deserves to taste his own medicines, not me.

joks 3 hours ago

I think there's a little bit of the Goomba fallacy at play here to be fair

coldtea 11 hours ago

Yes, it's a god's gift when the average user can do it, and satan's curse what a hated fucking mega-corp is doing it.

Where's the contradiction?

friendzis 15 hours ago

You can see this pattern in many different topics: updoots are highly correlated with a positive answer to "do I personally get to profit"?

achierius 15 hours ago

Yes, and? People need to eat. Billionaires are generally not interested in whether or not the average Joe gets to eat.

cyberax 18 hours ago

I would love to pay for content. I'm _paying_ for YouTube Premium.

But heck. Do I hate the YouTube interface, it degraded far past usability.

zx8080 18 hours ago

Write to their support. Oh, wait.

tclancy 1 day ago

So you’re that Hal Jordan then? Why would a Green Lantern feel the need to defend either? I feel like the Guardians would not accept your arguments as soon as you got to Oa, poozer. I guess what I am saying is don’t have a famous name. Seems obvious.

llbbdd 21 hours ago
OP appears to be talking about real life. What are you on about?
- bryanrasmussen 20 hours ago
  
  the user name he is responding to is HalJordan, Hal Jordan is the name of a comic book superhero: Green Lantern, a moral paragon.
  on edit: he is evidently being "sarcastic"

miki123211 13 hours ago

You conflate web crawling for inference with web crawling for training.

Web crawling for training is when you ingest content on a mass scale, usually indiscriminately, usually with a dumb crawler for scale's sake, for the purposes of training an LLM. You don't really care whether one particular website is in the dataset (unless it's the size of Reddit), you just want a large, diverse, high-quality data mix.

Web crawling for inference is when a user asks a targeted question, you do a web search, and fetch exactly those resources that are likely to be relevant to that search. Nothing ends up in the training data, it's just context enrichment.

People have a much larger issue with crawling for training than for inference (though I personally think both are equally ok).