Comment by rpdillon

3 months ago

I wouldn't want to make it out like I think OpenAI is the good guy here. I don't.

But conversations people thought they were having with OpenAI in private are now going to be scoured by the New York Times' lawyers. I'm aware of the third party doctrine and that if you put something online it can never be actually private. But I think this also runs counter to people's expectations when they're using the product.

In copyright cases, typically you need to show some kind of harm. This case is unusual because the New York Times can't point to any harm, so they have to trawl through private conversations OpenAI's customers have had with their service to see if they can find any.

It's quite literally a fishing expedition.

81 comments

rpdillon

cogman10 3 months ago

I get the feeling, but that's not what this is.

NYTimes has produced credible evidence that OpenAI is simply stealing and republishing their content. The question they have to answer is "to what extent has this happened?"

That's a question they fundamentally cannot answer without these chat logs.

That's what discovery, especially in a copyright case, is about.

Think about it this way. Let's say this were a book store selling illegal copies of books. A very reasonable discovery request would be "Show me your sales logs". The whole log needs to be produced otherwise you can't really trust that this is the real log.

That's what NYTimes lawyers are after. They want the chat logs so they can do their own searches to find NYTimes text within the responses. They can't know how often that's happened and OpenAI has an obvious incentive to simply say "Oh that never happened".

And the reason this evidence is relevant is it will directly feed into how much money NYT and OpenAI will ultimately settle for. If this never happens then the amount will be low. If it happens a lot the amount will be high. And if it goes to trial it will be used in the damages portion assuming NYT wins.

The user has no right to privacy. The same as how any internet service can be (and have been) compelled to produce private messages.

glenstein 3 months ago
>That's what NYTimes lawyers are after. They want the chat logs so they can do their own searches to find NYTimes text within the responses.
The trouble with this logic is NYT already made that argument and lost as applied to an original discovery scope of 1.4 billion records. The question now is about a lower scope and about the means of review, and proposed processes for anonymization.
They have a right to some form of discovery, but not to a blank check extrapolation that sidesteps legitimate privacy issues raised both in OpenAIs statement as well as throughout this thread.
- freejazz 3 months ago
  
  Again, as I pointed out to you numerous times in this thread. OpenAI already represented to the court that the data was anonymized and that they can anonymize it, so you are significantly departing from the actual facts in your discussion here. There are no genuine privacy issues left here. The data is anonymous and it is under a protective order so it must be maintained confidentially.
tantalor 3 months ago
> The user has no right to privacy
The correct term for this is prima facie right.
You do have a right to privacy (arguably) but it is outweighed by the interest of enforcing the rights of others under copyright law.
Similarly, liberty is a prima facie right; you can be arrested for committing a crime.
- ronsor 3 months ago
  
  > enforcing the rights of others under copyright law
  I certainly do not care about copyright more than my own privacy, and I certainly don't find that interest to be the public's interest, though perhaps it's the interest of legacy corporations and their lobbyists.
- antonvs 3 months ago
  
  > You do have a right to privacy (arguably) but it is outweighed by the interest of enforcing the rights of others under copyright law.
  What governs or codifies that? I would have expected that there would need to be some kind of specific overriding concern(s) that would need to apply in order to violate my (even limited) expectation of privacy, not just enforcing copyright law in general.
  E.g. there's nothing resembling "probable cause" to search my own interactions with ChatGPT for such violations. On what basis can that be justified?
- SilverElfin 3 months ago
  
  Is there any evaluation of which right or which harm is larger? It seems like the idea that one outweighs another is arbitrary. Is there a principled thing behind it?
  
  1 reply →
- rpdillon 3 months ago
  
  Seems to me my right to privacy is far more important than their right to copyright enforcement.
  
  1 reply →
realusername 3 months ago
> NYTimes has produced credible evidence that OpenAI is simply stealing and republishing their content. The question they have to answer is "to what extent has this happened?"
Credible to whom? In their supposed "investigation", they sent a whole page of text and complex pre-prompting and still failed to get the exact content back word for word. Something users would never do anyways.
And that's probably the best they've got as they didn't publish other attempts.
- mikkupikku 3 months ago
  
  Agreed, they could carefully coerce the model to more or less output some of their articles, but the premise that users were routinely doing this to bypass the paywall is silly.
  
  3 replies →
protocolture 3 months ago
>NYTimes has produced credible evidence that OpenAI is simply stealing and republishing their content
They shouldnt have any rights to data after its released.
>That's a question they fundamentally cannot answer without these chat logs.
They are causing more damage than anything chatGPT could have caused to NYT. Privacy needs to be held higher than corporate privilege.
>Think about it this way. Let's say this were a book store selling illegal copies of books.
Think of it this way, no book should be illegal.
>They can't know how often that's happened and OpenAI has an obvious incentive to simply say "Oh that never happened".
NYT glazers do more to uphold OpenAI as a privacy respecting platform than OpenAI has ever done.
>If this never happens then the amount will be low.
Should be zero, plus compensation to the affected OpenAI users from NYT.
>The user has no right to privacy.
And this needs to be remedied immediately.
>The same as how any internet service can be (and have been) compelled to produce private messages.
And this needs to be remedied immediately.
- hekkle 3 months ago
  
  I get that you're mad, and rightly should be for an invasion of your privacy, but the NYT would be foolish to use any of your data for anything other than this lawsuit, and to not delete it afterwards, as per their request.
  They can't use this data against any individual, even if they explicitly asked, "How do I hack the NYT?"
  The only potential issue is them finding something juicy in someone's chat, that they could publish as a story; and then claiming they found out about this juicy story through other means, (such as a confidential informant), but that's not likely an issue for the average punter to be concerned about.
  
  4 replies →
antonvs 3 months ago
> The user has no right to privacy. The same as how any internet service can be (and have been) compelled to produce private messages.
The legal term is "expectation of privacy", and it does exist, albeit increasingly weakly in the US. There are exceptions to that, such as a subpoena, but that doesn't mean anyone can subpoena anything for any reason. There has to be a legal justification.
It's not clear to me that such a justification exists in this case.
- amanaplanacanal 3 months ago
  
  That's why there is someone trained in the law (the judge) to make that determination.
rpdillon 3 months ago

It's not credible. Using AI to regurgitate news articles is not a good use of the tool, and it is not credible that any statistically significant portion of their user base is using the tool for that.
throw20251110 3 months ago

> Think about it this way. Let's say this were a book store selling illegal copies of books. A very reasonable discovery request would be "Show me your sales logs". The whole log needs to be produced otherwise you can't really trust that this is the real log.
Your claim doesn’t hold up, my friend. It’s inaccurate because nobody archives an entire dialogue with a seller for the record, and you certainly don’t have to show identification to purchase a book.
terminalshort 3 months ago
Even if OpenAI is reproducing pieces of NYT articles, they still have a difficult argument because in no way is is a practical means of accessing paywalled NYT content, especially compared to alternatives. The entire value proposition of the NYT is news coverage, and probably 99.9% of their page views are from stories posted so recently that they aren't even in the training set of LLMs yet. If I want to reproduce a NYT story from LLM it's a prompt engineering mess, and I can only get old ones. On the other hand I can read any NYT story from today by archiving it: https://archive.is/5iVIE. So why is the NYT suing OpenAI and not the Internet Archive?
- freejazz 3 months ago
  
  OpenAI is not allowed to reproduce the NYT's articles, that's copyright infringement. It does not really matter if it is a practical thing or not, that would only go to damages, not liability.
  
  2 replies →
observationist 3 months ago
You don't hate the media nearly enough.
"Credible" my ass. They hired "experts" who used prompt engineering and thousands of repetitions to find highly unusual and specific methods of eliciting text from training data that matched their articles. OpenAI has taken measures to limit such methods and prevent arbitrary wholesale reproduction of copyrighted content since that time. That would have been the end of the situation if NYT was engaging in good faith.
The NYT is after what they consider "their" piece of the pie. They want to insert themselves as middlemen - pure rent seeking, second hander, sleazy lawyer behavior. They haven't been injured, they were already dying, and this lawsuit is a hail mary attempt at grifting some life support.
Behavior like that of the NYT is why we can't have nice things. They're not entitled to exist, and by engaging in behavior like this, it makes me want them to stop existing, the faster, the better.
Copyright law is what you get when a bunch of layers figure out how to encode monetization of IP rights into the legal system, having paid legislators off over decades, such that the people that make the most money off of copyrights are effectively hoarding those copyrights and never actually produce anything or add value to the system. They rentseek, gatekeep, and viciously drive off any attempts at reform or competition. Institutions that once produced valuable content instead coast on the efforts of their predecessors, and invest proceeds into lawsuits, lobbying, and purchase of more IP.
They - the NYT - are exploiting a finely tuned and deliberately crafted set of laws meant to screw actual producers out of percentages. I'm not a huge OpenAI fan, but IP laws are a whole different level of corrupt stupidity at the societal scale. It's gotcha games all the way down, and we should absolutely and ruthlessly burn down that system of rules and salt the ground over it. There are trivially better systems that can be explained in a single paragraph, instead of requiring books worth of legal code and complexities.
- totallymike 3 months ago
  
  I'm not a fan of NYT either, but this feels like you're stretching for your conclusion:
  > They hired "experts" who used prompt engineering and thousands of repetitions to find highly unusual and specific methods of eliciting text from training data that matched their articles....would have been the end of the situation if NYT was engaging in good faith.
  I mean, if I was performing a bunch of investigative work and my publication was considered the source of truth in a great deal of journalistic effort and publication of information, and somebody just stole my newspaper off the back of a delivery truck every day and started rewriting my articles, and then suddenly nobody read my paper anymore because they could just ask chatgpt for free, that's a loss for everyone, right?
  Even if I disagree with how they editorialize, the Times still does a hell of a lot of journalism, and chatgpt can never, and will never be able to actually do journalism.
  > they want to insert themselves as middlemen - pure rent seeking, second hander, sleazy lawyer behavior
  I'd love to hear exactly what you mean by this.
  Between what and what are they trying to insert themselves as middlemen, and why is chatgpt the victim in their attempts to do it?
  What does 'rent seeking' mean in this context?
  What does 'second hander' mean?
  I'm guessing that 'sleazy lawyer' is added as an intensifier, but I'm curious if it means something more specific than that as well, I suppose.
  > Copyright law....the rest of it
  Yeah. IP rights and laws are fucked basically everywhere. I'm not smart enough to think of ways to fix it, though. If you've got some viable ideas, let's go fix it. Until then, the Times kinda need to work with what we've got. Otherwise, OpenAI is going to keep taking their lunch money, along with every other journalist's on the internet, until there's no lunch money to be had from anyone.
  
  3 replies →
Hizonner 3 months ago

[flagged]
sroussey 3 months ago
> The user has no right to privacy. The same as how any internet service can be (and have been) compelled to produce private messages.
This is nonsense. I’ve personally been involved in these things, and fought to protect user privacy at all levels and never lost.
- giraffe_lady 3 months ago
  
  You've successfully fought a subpoena on the basis of a third party's privacy? More than once? I'd love to hear more.
  
  3 replies →

jcranmer 3 months ago

> In copyright cases, typically you need to show some kind of harm.

NYT is suing for statutory copyright infringement. That means you only need to demonstrate that the copyright infringement, since the infringement alone is considered harm; the actual harm only matters if you're suing for actual damages.

This case really comes down to the very unsolved question of whether or not AI training and regurgitation is copyright infringement, and if so, if it's fair use. The actual ways the AI is being used is thus very relevant for the case, and totally within the bounds of discovery. Of course, OpenAI has also been engaging this lawsuit with unclean hands in the first place (see some of their earlier discovery dispute fuckery), and they're one of the companies with the strongest "the law doesn't apply to US because we're AI and big tech" swagger.

Workaccount2 3 months ago
NYT doesn't care about regurgitation. When it was doable, it was spotty enough that no one would rely on it. But now the "trick" doesn't even work anymore (you would paste the start of an article and chatgpt would continue it).
What they want is to kill training, and more over, prevent the loss of being the middle-man between events and users.
- sfink 3 months ago
  
  > What they want is to kill training, and more over, prevent the loss of being the middle-man between events and users.
  So... they want to continue reporting news, and they don't want their news reports to be presented to users in a place where those users are paying someone else and not them. How horrible of them?
  If NYT is not reporting news, then NYT news reports will not be available for AIs to ingest. They can perhaps still get some of that data from elsewhere, perhaps from places that don't worry about the accuracy of the news (or intentionally produces inaccurate news). You have to get signal from somewhere, just the noise isn't enough, and killing off the existing sources of signal (the few remaining ones) is going to make that a lot harder.
  The question is, does journalism have a place in a world with AIs, and should OpenAI be the one deciding the answer to that question?
  
  2 replies →
- totallymike 3 months ago
  
  > prevent the loss of being the middle-man between events and users
  I'm confused by this phrase. I may be misreading but it sounds like you're frustrated, or at least cynical about NYT wanting to preserve their business model of writing about things that happen and selling the publication. To me it seems reasonable they'd want to keep doing that, and to protect their content from being stolen.
  They certainly aren't the sole publication of written content about current events, so calling them "the middle-man between events and users" feels a bit strange.
  If your concern is that they're trying to prevent OpenAI from getting a foot in the door of journalism, that confuses me even more. There are so, so many sources of news: other news agencies, independent journalists, randos spreading word-of-mouth information.
  It is impossible for chatgpt to take over any aspect of being a "middle-man between events and users" because it can't tell you the news. it can only resynthesize journalism that it's stolen from somewhere else, and without stealing from others, it would be worse than the least reliable of the above sources. How could it ever be anything else?
  This right here feels like probably a good understanding of why NYT wants openai to keep their gross little paws off their content. If I stole a newspaper off the back of a truck, and then turned around and charged $200 a month for the service of plagiarizing it to my customers, I would not be surprised if the Times's finest lawyers knocked on my door either.
  Then again, I may be misinterpreting what you said. I tend to side with people who sue LLM companies for gobbling up all their work and regurgitating it, and spend zero effort trying to avoid that bias
  
  2 replies →
- watwut 3 months ago
  
  > prevent the loss of being the middle-man between events and users.
  OpenAI is free to do own reporting. NY Times is nowhere near trying to prevent others for competing as middleman.
- SilverElfin 3 months ago
  
  It’s more than middle man right? Like if visits to NYT reduce then they get less ads revenue and their ability to do business goes away. On the other hand, if they demand licensing fees then they’ll just be marginalized by other news anyways.
  
  1 reply →
- itsnibs 3 months ago
  
  It sounds like the defendant would much prefer middle-men who do not have the resources to enforce copyright.

otterley 3 months ago

> This case is unusual because the New York Times can't point to any harm

It helps to read the complaint. If that was the case, the case would have been subject to a Rule 12(b)(6) (failure to state a claim for which relief can be granted) challenge and closed.

Complaint: https://nytco-assets.nytimes.com/2023/12/NYT_Complaint_Dec20...

See pages 60ff.

rpdillon 3 months ago
My observation is that section does not articulate any harm. It _claims_ harm, but doesn't actually explain what the harm is. Reduced profits? Lower readership? All they say is "OpenAI violated our copyrights, and we deserve money."
> 167. As a direct and proximate result of Defendants’ infringing conduct alleged herein, The Times has sustained and will continue to sustain substantial, immediate, and irreparable injury for which there is no adequate remedy at law. Unless Defendants’ infringing conduct is enjoined by this Court, Defendants have demonstrated an intent to continue to infringe the copyrighted works. The Times therefore is entitled to permanent injunctive relief restraining and enjoining Defendants’ ongoing infringing conduct. > 168. The Times is further entitled to recover statutory damages, actual damages, restitution of profits, attorneys’ fees, and other remedies provided by law.
They're simply claiming harm, nothing more. I want to see injuries, scars, and blood if there's harm. As far as I can tell, the NYT was on the ropes long before AI came along. If they could actually articulate any harm, they wouldn't need to read through everyone's chats.
- otterley 3 months ago
  
  > As a direct and proximate result of Defendants’ infringing conduct alleged herein, The Times has sustained and will continue to sustain substantial, immediate, and irreparable injury for which there is no adequate remedy at law. Unless Defendants’ infringing conduct is enjoined by this Court, Defendants have demonstrated an intent to continue to infringe the copyrighted works. The Times therefore is entitled to permanent injunctive relief restraining and enjoining Defendants’ ongoing infringing conduct.
  This is boilerplate language in a claim seeking injunctive relief. In contract law in law school, you learn there's a historical difference between cases at law (where the only remedy is money) and cases in equity (where the court can issue injunctions). If you want to stop someone from violating your rights, you claim "irreparable injury" (that is, money isn't enough) and ask for the court in equity to issue an injunction.
  > It _claims_ harm, but doesn't actually explain what the harm is. Reduced profits? Lower readership? All they say is "OpenAI violated our copyrights, and we deserve money."
  Copyright violation, in and of itself, constitutes a judicially cognizable injury. It's a violation of a type of property right - that is, the right to exclude others from using your artistic works without your permission. The Copyright Act specifies that victims of copyright infringement are not only entitled to an injunction, but also to statutory damages as well as compensatory damages to be determined by a jury. See 17 U.S.C. § 504.
  Similarly, you don't have to claim a specific injury in a garden-variety trespass action. The violation of your property rights is enough.
  
  2 replies →
- terminalshort 3 months ago
  
  > sustain substantial, immediate, and irreparable
  Furthermore, any alleged injury is absolutely reparable. How many times did OpenAI replicate their content and how many page views did they lose to it? Very reparable monetary damages, if it did in fact occur (and I'm pretty sure it didn't).

troyvit 3 months ago

It's a part of privacy policy boilerplate that if a company is compelled by the courts to give up its logs it'll do it. I'm sure all of OpenAI's users read that policy before they started spilling their guts to a bot, right? Or at least had an LLM summarize it for them?

Rastonbury 3 months ago

This is it isn't it? For any technology, I don't think anyone should have the expectation of privacy from lawyers if the company who has your data is brought to court

Workaccount2 3 months ago

The original lawsuit has lots of examples of ChatGPT (3.5? 4?) regurgitating article...snippets. They could get a few paragraphs with ~80-90% perfect replication. But certainly not full articles, with full accuracy.

This wasn't solid enough for a summary judgement, and it seems the labs have largely figured out how to stop the models from doing this. So it looks like NYT wants to comb all user chats rather than pay a team of people tens of thousands a day to try an coax articles out of ChatGPT-5.

stocksinsmocks 3 months ago

No doubt. I’m sure NYT sees an opportunity to buy a few more years of life support by pickpocketing the conductor of the AI gravy train. When Sam Altman and the Sulzbergers fight though, as a normal person, my hope is that they destroy each other.

I think the winner are Chinese (and by extension OSS) models as they can ignore copyright. A net win, I think.

Sherveen 3 months ago

Yeah, everyone else in the comments so far is acting emotionally, but --

As a fan and DAU of both OpenAI and the NYT, this is just a weird discovery demand and there should be another pathway for these two to move fwd in this case (NYT to get some semblance of understanding, OAI protecting end-user privacy).

totallymike 3 months ago

It sounds like the alternate path you're suggesting is for NYT to stop being wrong and let OpenAI continue being right, which doesn't sound much like a compromise to me.

themafia 3 months ago

> having with OpenAI in private

I don't blieve that OpenAI, or any American corporation, has the wherewithal to actually maintain _your_ privacy in the face of _their_ profitability.

> typically you need to show some kind of harm.

You copied my material without my permission. I've been harmed. That right is independent of pricing. Otherwise Napster would never have generated legal cases.

> It's quite literally a fishing expedition.

It's why American courts are awesome.

phendrenad2 3 months ago

> This would allow them to access millions of user conversations that are unrelated to the case

It feels like the NYT is really fishing for inside information on how GPT is used so they can run statistical analysis and write articles about it. I.E. if they find examples of racism, they can get some great articles about how racism is rampant on GPT or something.

Noaidi 3 months ago

To show harm they need the proof, this is the point of the lawsuit. They have sufficient evidence that OpenAI was scraping the web and the NY Times.

When Altman says "They claim they might find examples of you using ChatGPT to try to get around their paywall." he is blatantly misrepresenting the case.

https://smithhopen.com/2025/07/17/nyt-v-openai-microsoft-ai-...

"The lawsuit focuses on using copyrighted material for AI training. The NYT says OpenAI and Microsoft copied vast amounts of its content. They did this to build generative AI tools. These tools can output near-exact copies of NYT articles. Therefore, the NYT argues this breaks copyright laws. It also hurts journalism by skipping paywalls and cutting traffic to original sites. The complaint shows examples where ChatGPT mimics NYT stories closely. This could lead to money loss and harm from AI errors, called hallucinations."

This has nothing to do with the users, it has everything to do with OpenAI profiting off of pirated copyrighted material.

Also, Altmans is getting scared because the NY Times proved to the judge that CahtGPT copied many articles:

"2025 brings big steps in the case. On March 26, 2025, Judge Sidney Stein rejected most of OpenAI’s dismissal motion. This lets the NYT’s main copyright claims go ahead. The judge pointed to “many” examples of ChatGPT copying NYT articles. He found them enough to continue. This ruling dropped some side claims, like unfair competition. But it kept direct and contributory infringement, plus DMCA breaches."

terminalshort 3 months ago
> The lawsuit focuses on using copyrighted material for AI training
Well that's going to go pretty poorly for them considering it has already been ruled fair use twice: https://www.whitecase.com/insight-alert/two-california-distr...
On the other hand, distributing copies of NYT content is actually a breach of copyright, but only if the NYT can prove it was actually happening.
- rpdillon 3 months ago
  
  It's really interesting living through this revolution because it's pretty obvious to me that the outcome here needs to be that training is fair use, pirating materials you train on is not going to end up being okay, and the user of the AI tool will be responsible for whether or not the resulting work is infringing. AI tools that are predominantly designed for infringing use cases will of course be ruled against.
  I feel like this is all so blindingly obvious and yet I feel like it's going to take us decades to get there. I guess the wheels of justice turn slowly.
rpdillon 3 months ago

Training has sometimes been held to be fair use under certain circumstances, but in determining fair use, one of the four factors that is considered is how it affects the market for the work being infringed. I would expect that determining to what degree it's regurgitating the New York Times' content is part of that analysis.

ozgrakkurt 3 months ago

It is better if it is out in the open compared to just some select few diabolical organizations having access to it

vintagedave 3 months ago

100% agreed. In the time you wrote this, I also posted: https://news.ycombinator.com/item?id=45901054

I felt quite some disappointment with the comments I saw on the thread at that time.

Alex2037 3 months ago

>But conversations people thought they were having with OpenAI in private

...had never been private in the first place.

not only is the data used for refining the models, OpenAI had also shariah policed plenty of people for generating erotica.

Workaccount2 3 months ago
This is about private chats, which are not used for training and only stored for 30 days.
Also, you need to understand, that for huge corps like OpenAI, the lying on your ToS will do orders of magnitude more damage to your brand than what you would gain through training on <1% more user chats. So no, they are not lying when they say they don't train on private chats.
- bonsai_spool 3 months ago
  
  > Also, you need to understand, that for huge corps like OpenAI, the lying on your ToS will do orders of magnitude more damage to your brand than what you would gain
  Is this true? I can’t recall anything like this (look at Ashley Madison which is alive and well)
  
  3 replies →
mock-possum 3 months ago

Yeah I don’t get why more people don’t understand this - why would you think your conversation was private when it wasnt actually private. Have you not been paying attention.
IlikeKitties 3 months ago

> OpenAI had also shariah policed plenty of people for generating erotica.
That framing is retorically brilliant if you think about it. I will use that more. Chat Sharia Law for Chat Control. Mass Sharia Surveillance from flock etc.