Comment by lukeigel
4 days ago
Thanks! And it's a lot of info, yeah. ~90% of new data in yesterday's drop was photographs, which they redacted for us.
The House Oversight Committee's giant drop in November had tons of data we still didn't take advantage of even after doing the original Jmail, like flight logs.
For the Yahoo release, which is still ongoing, the folks at Drop Site News (see https://www.jmail.world/about) are handling the manual redaction which has been very time consuming, even with tons of AI to help in the background.
Would be nice to explain at some point how we did the structuring of the destructured data.
For now we’re focusing on fixing the bugs because we’re already seeing an insane wave of traffic so most of us are focused on keeping the site alive.
Hey, I’d be interested in your thoughts on this, or the key ideas/research results you relied on:
Yes! We used our friends at Reducto (https://reducto.ai/) for all document extraction and parsing (one of the best companies I've ever referred to YC ;) )
We did an initial parsing pass of all four DOJ document batches on Friday. This takes a raw PDF and returns chunks containing typed blocks—each with a type (Title, Text, Figure, etc.), bounding boxes, content, and confidence scores. For PDFs that were just scans of photographs (which was like 90% of new content in Friday's release), it gave in depth descriptions of those! You can type search terms like "door" at https://www.jmail.world/photos to see what I mean.
For apps like Jmail and JFlights we use their structured extraction endpoint instead—you define a schema (e.g. {from, to, subject, date, body} for emails or {departure_airport, arrival_airport, passengers[], date} for flights) and it pulls those fields directly into JSON.
The JFlights example served as the best ad for Reducto and how doc parsing technology can speed up hours of journalistic investigations like this.
See for yourself. Given this document
https://www.jmail.world/drive/HOUSE_OVERSIGHT_002031
It inferred and enriched multiple flight cards on JFlights (https://www.jmail.world/flights). I was really shook when I first saw this.
1 reply →
One interesting thread to pull is "Stuff released and then Yanked back" ...
Images removed from Epstein files less than a day after being posted - https://www.abc.net.au/news/2025-12-21/images-removed-from-e...
promises all the sleuthing excitement of chasing the significance of Donald in a Drawer.
Images were also planted to falsely suggest incriminating evidence.
while true, it would probably be useful to provide examples. The one that I am aware of seems to be a picture showing Clinton, Michael Jackson, and Diana Ross with "redacted" victims
https://www.imdb.com/news/ni65628031/
https://bsky.app/profile/meidastouch.com/post/3mag7myutmc2d
however it seems that this photo is actually taken from a 2003 Democratic fundraiser, and the redacted images of victims were of Diana Ross' son Evan, and Michael Jackson's kids, Paris and Prince Jackson. This may or may not be accurate either, since I have not been able to dig down into the photo and determine if it has any connections to a supposed 2003 fundraiser.
But it seems more likely to be true than not that this was sloppily planted evidence that was especially insultingly fake.
on edit: looking closer does not seem to be exact same photos, but instead two different photos taken at the same time and place, so in the 2003 Dem fundraising, but a different photo of that. So it could be that Epstein had it and DOJ thought hey, look at these pervs! Let's release!!
9 replies →
I see people are not clued into this and incredulously downvote because the file release appears to be in good faith to them such that illegal evidence tampering is out of the question
See https://news.ycombinator.com/item?id=46341688
1 reply →
[flagged]
I'm being snarky and this isn't such a serious comment and I don't really mean this for Gemini but can you imagine using something like Gemini ("Hi, please comb through this") and it just refuses on ethical grounds
We found that Codex indeed refuses but Claude + Gemini are willing to RAG it
also, shoutout the Jason Liu (https://news.ycombinator.com/user?id=jxnlco) for discovering that one. His turbopuffer-based version of Jemini is coming soon!
Usually Claude is the prude. Personally I haven't even tried for fear what I'd find. I can stomach homicide and war pictures, but Epstein is too much.
2 replies →
But, whoever’s doing the redacting sees the original right? What prevents the redactor from saying, “here’s what the document really said.” Or “here’s who’s in the image, I saw it before I redacted it?”
The idea of spending the rest of their life in prison is what stops them
Yeah but a few words from somebody like Ghislaine could completely fuck shit up for a lot of people.
Of course, she'll have hanged herself shortly afterward while the security cameras were malfunctioning.
Part of the law mandates that all redactions will be listed for Congress within 15 days.
That’s a good point. I would imagine they break it up into pieces - in a reCAPTCHA sorta way - and any given person sees a sentence or a piece of a sentence.
An alternative would be to strip out all obvious known words and only leave unknowns (i.e., names) and then have those fragments reviewed (in a reCAPTCHA sorta way).
Finally, for images, cover all faces and the one by one decide which should remain covered and which should not.
LOTS of work but there are workflows to mitigate the ability for reviewers to connect more than they should.
People who they think will do this don't get to be redactors. It's all about power and relationships, not technology.
Given how MTG went completely silent despite her high profile platform, I'm guessing the civil (or at this point, royal) servants don't want their families harmed.
I’d guess a first pass is done automatically? Eg if a page mentions eg Trump, just redact that whole page/paragraph/etc. So the people who have done the closer reading to redact further probably don’t actually know the scale of what was already redacted. Just a guess though.