Comment by verditelabs

1 day ago

I am on the vesuvius challenge team that did the segmentation, unwrapping, and ink detection, so feel free to ask any questions.

144 comments

verditelabs

Izmaki 1 day ago

How awesome do you feel right now? This is HUUUGE! To think that a scroll was unreadable for so, so long, until we invented machines that let us read it slice by slice. It's such an unfathomable achievement - we made machines that let us read 2000+ year olds fragile scrolls without ever opening them - and you helped do just that.

Hats off!

verditelabs 1 day ago
In March I went to Beam Line 18 at the European Synchrotron Radiation Facility. I had to swap out the scrolls on the xray pedestal. Scrolls that were presented as a diplomatic gift to Napoleon and Josephine by King Ferdinand. France has 2 of the 6 that they were given still in tact. I had to handle both of them. I have never felt more stressed in my life and have never and will probably never again handle such a priceless artifact.
I feel the opposite of that feeling and am immensely proud of everything that the core challenge team has accomplished
- _boffin_ 16 hours ago
  
  I am floored at these achievements. Such amazing work.
  If I may ask, when you started thinking about achieving this, what were the first attempts, ideas on how to go about it? What were some of the obstacles that had to be overcome to achieve this ?
  
  11 replies →

amluto 1 day ago

Do you know what kinds of features the model is picking up on to distinguish ink from papyrus? And did you have any labeled data (images where a human expert has identified ink or perhaps a scan of a burnt scroll with known content) to help train it?

Certainly my Mark 1 eyeballs would not obviously perform better than random guessing at this task. Although my eyeballs are, if nothing else, nerfed by only being able to see a 2D slice of the data.

verditelabs 1 day ago
Yes. Most of the ink we have come across is carbon based. This leaves a certain texture on the scrolls that is recoverable and viewable with fairly basic physically based rendering, though how much ink is recoverable varies greatly from one character to the next. I don't have links handy but we just published updates to our data viewer page on our website. Pherc.Paris.4 I believe has the best overlay of ink.
A lot of labeled data is available on our ftp server which has public access
- amluto 20 hours ago
  
  When you say "physically based rendering" do you mean that one could build a PBR model based on the (unrolled?) xray data, render that model, and be able to see the ink?
  edit: I found this:
  https://scrollprize.org/data_browser#/samples/PHercParis4/se...
  The JSON seems to suggest that I'm mostly looking at ink detection output, but I could easily be using the tool wrong.
  But I also found this awesome explanation:
  https://scrollprize.org/data_fragments
  I guess I bunch of the training was done by using fragments of scrolls where ground truth data is available using IR photography.
  Also... that xray resolution is absolutely amazing!
  
  1 reply →
- londons_explore 21 hours ago
  
  I assume that's because the writer probably sometimes shortly after re-inking the writing instrument was putting down a 10x thicker layer...

Dzugaru 1 day ago

Outstanding work! I've participated in the challenge, but didn't get far. One of the questions I had at the time was - if I'm going to use ML to detect ink, could it invent hallucinated letters, or even parts of text, and how to prevent that?

verditelabs 1 day ago
Yes, it's quite possible for ML to hallucinate ink, though it is on a much more local scale, like predicting a slightly longer stroke, filling in more of a character than is actually in the data, etc. Perhaps enough to change a reading of a character or show where ink isnt. It is difficult for ink detection to hallucinate grammatical and idiomatic greek and latin.
- im3w1l 1 day ago
  
  What is the input to the ML algorithm? Does it know the surrounding context so that it has a chance to deduce "if this stroke is slightly longer then the end result will be idiomatic greek and latin"?
  
  1 reply →
cwnyth 1 day ago
Not all machine learning is generative AI.
- mc32 1 day ago
  
  True but like regular document scanning software there can be errors in detection.
  
  3 replies →
garethsprice 1 day ago

[dead]

nkoren 1 day ago

Massive kudos to the whole team. I've been waiting 30 years for this announcement, ever since I first heard about the scrolls. Fantastic work!

Jeaye 17 hours ago

I am researching for a talk on the philosophy of code, the similarities of engineering and art, and why we enjoy reading old code. This amazing work you folks have done may be an interesting tangent.

The biggest question I have for you is why you imagine we are so interested in reading these old scrolls. Surely some of it is to see whether or not, technically, we can. Surely some of it is to get a glimpse into the human expression inscribed on them. Are we looking to learn anything, or just to connect with our ancestors? I'd like to hear your take on it, both for why you think it's important and, if you know, why your colleagues feel similarly.

verditelabs 17 hours ago
I wrote this as an answer to a different question but I think it applies to what you're asking as well
> Though I have an interest in Old Norse and I spend a lot of time reading Scandinavian runestones. > 90% of them are grave markers for a dead father, mother, brother, sister, cousin, etc. If I've learned anything from that, it's that people across time and space all lead lives as real and complex as anyone else's. Their joys were as high as mine have been and their sorrows as low as mine have been.
A VSauce video I watched a long time ago described that realization as "chronosonder". I think trying to understand those that came before us and why they made the decisions that they did given the circumstances they were in can help better inform us of the things we choose to do given our own circumstances.
Otherwise, I think that a lot of things are worth doing just to see if it's possible. I like to lift weights and I'm training to lift the Dinnie Stones one day; a pair of stones that are a combined ~730 pounds. The physical and mental benefits of exercise and training are well documented and great but at the end of the day I just _really_ wanna pick up 2 stones. There's nothing more to it than that, and that's ok with me.
One of the things we said a lot in 2023 was "We just wanna read the scrolls" but that slogan has unfortunately fallen a bit by the wayside as the goal and path got longer and initial hype started to fade, but I think it perfectly encapsulates why: The scrolls are there. They can be read. Why not read them?
- card_zero 17 hours ago
  
  1. Why is that a realization, are there really people who say "Scandinavians are just mechanical" or "9th century people were made out of wood"? Why would their lives be assumed not to be "real", what even is that mindset?
  2. "Real and complex lives" doesn't mean "just the same as ours", mind you.
  
  1 reply →

adriand 1 day ago

What are the wildest, most exciting but plausible things that might be discovered in these documents?

verditelabs 1 day ago
I am not a papyrologist or a classicist, rather I'm a computer scientist, so my expertise is unfortunately not in _what_ the scrolls say, rather how we get there. That being said I think and hope that there will be a trove of things that has no known provenance at all, completely lost works that elude the public memory.
- arikrahman 1 day ago
  
  Well what were your first thoughts when you decoded the script, besides the obvious Eureka, after making some sense of the texts?
  
  2 replies →
- readthenotes1 1 day ago
  
  Your response reminds me of Nigel Richards :)
  https://en.wikipedia.org/wiki/Nigel_Richards
  Congratulations, and thank-you!
GeoAtreides 1 day ago
Aristotle's second book of Poetics, of course.
- wolfi1 17 hours ago
  
  we already know that a blind Italian monk burnt it to ashes, at least, that's what Eco wrote and he was a learned scholar
  
  2 replies →
colechristensen 1 day ago
Here's a list. The scrolls are from a library that burned in 79 AD.
https://en.wikipedia.org/wiki/List_of_lost_literary_works
- kouru225 1 day ago
  
  Woah there was a lost Homer epic comedy about a bumbling fool named Margites?
  
  3 replies →
suddenlybananas 1 day ago
Probably a lot more texts of Epicurean philosophy and not a whole lot else unfortunately according to my papyrologist friend.
- Matticus_Rex 1 day ago
  
  That's what was thought, but maybe not -- only one of the three so far looks Epicurean, which is not what was expected. Maybe it's a fluke, but historians are buzzing a bit about whether it might be broader than expected.
- cwmoore 1 day ago
  
  Why would Epicurean philosophy be unfortunate?
  I was under the impression that there was almost nothing left of that school of thought, and that it’s writings had been destroyed.
  What would you like to have instead?
  
  7 replies →
- kome 1 day ago
  
  in the paper it says "The recovered text is a philosophical treatise on ethics, and the evidence points to a Stoic work: it turns on human nature, impulse, and the moral progress of human beings, and its final preserved column names Aristocreon — nephew and disciple of the great Stoic Chrysippus — which, together with the language and themes of the text, places it in a Stoic context and dates it to the 2nd century BC."

tomcam 1 day ago

Absolutely incredible work. This is one of the most amazing news articles I’ve encountered in decades. Congratulations team!

Barbing 3 hours ago

Mind-bending achievement from you all - thank you!

gadders 7 hours ago

The science to get the text is cool, but where is the best place to read discussion of the text in the scroll, it's context, meaning etc?

verditelabs 7 hours ago

The core challenge team is focused on the technology side to provide the images of ink to our team of papyrologists and they do the transcription, translation, reading, and scholarship.
This announcement was part of a larger conference being put on by Frederica Nicolardi, our lead papyrologist. The livestream of each day are available at: https://www.youtube.com/@cispemgigante/streams .

2ap 1 day ago

I'm interested to know about the approaches that you tried with the ML, and then decided to not use. In practice, the options are so many. How did you come up with the final approach - and was there a systematic way to decide which options to go for?

verditelabs 1 day ago
I am not on the research team, rather on the production side of things, so my knowledge on that is pretty limited. I think one of the main takeaways from a lot of the research, though, on both the segmentation side and the ink detection side, is that it's a lot less about what models and techniques and such you use, but how good your training data is. Gathering ground truth is hard, and if you don't have a lot of good ground truth, it doesn't matter if your code is perfect, you'll never get results.
- rossdavidh 1 day ago
  
  That is a general truth of most ML; many models _can_ find the information in the data, if the data is good enough. If it is not, then likely no model can.
- EvanAnderson 1 day ago
  
  You brought up what I'm most curious about: Where does the ground truth come from for this work since you can't just to unwrap a scroll to tell if the model got it right or, presumably, make a facsimile scroll and wrap it up.
  
  2 replies →
- gekoxyz 1 day ago
  
  > it's a lot less about what models and techniques and such you use, but how good your training data is.
  Ah, the good old bitter lesson strikes again

thatoneengineer 1 day ago

Imagine a worst case scenario: the Herculaneum scrolls turn out to be just the works of this one mediocre pet philosopher. What would we still expect to learn from them, and what would the next step be?

verditelabs 1 day ago
Beats me; I am a programmer, not a classicist.
Though I have an interest in Old Norse and I spend a lot of time reading Scandinavian runestones. > 90% of them are grave markers for a dead father, mother, brother, sister, cousin, etc. If I've learned anything from that, it's that people across time and space all lead lives as real and complex as anyone else's. Their joys were as high as mine have been and their sorrows as low as mine have been.
- manbash 21 hours ago
  
  It's so humbling to realize that the human mind hasn't changed a lot. Only our environment.

tsol 1 day ago

How do get to do that? As in what did you study to get the prerequisite knowledge, and how did you find this particular job? When I see interesting jobs I'm anyways curious what path lead there

verditelabs 1 day ago
I am a computer scientist. I studied CS in university, worked in the semiconductor industry for a while, got started as a participant in the challenge aspect of the Vesuivus Challenge. They were hiring, I sent in an application, interviewed, and was offered the job.
- matneyx 1 day ago
  
  That last sentence is so perfect, like my dad answering the question of how he lost weight. "I ate less and exercised more."

NooneAtAll3 1 day ago

how many scrolls have been scanned so far? what's the main limitation on scan amount?

have any attempts (or just ideas) been made to recreate such charring on known texts?

verditelabs 1 day ago
30 scrolls, maybe? Something like that. I scanned Pherc Paris 4 and Pherc Paris 3 at Beam line 18 at ESRF back in March.
The team did "the campfire scroll" experiment a few years ago to replicate carbonization, unrolling, and ink detection. That is the only case I am aware of. It proved the method could work but it's not a source of say training data; it varies too much from the real scrolls.
The main limitation is time and cost. We have to scan on what is AFAIK the most powerful x-ray beam line in the world. It is not cheap
- CGMthrowaway 1 day ago
  
  You had to pay? I understand the machine cost many hundreds of millions of dollars, but I would have thought for academic researchers doing open science, the beamtime is free (funded by the govt / science trusts).
  
  5 replies →

thom 1 day ago

Do we have a sense for what proportion of text is actually retrievable from these scrolls?

verditelabs 1 day ago
That varies greatly on the state of preservation of the scroll. For some of the scrolls we can recover entire columns of text. But this is a best case. Plenty of scrolls, or portions of scrolls, are extremely damaged and warped to where our current methods cannot unroll them through any combination of automated and human driven unrolling. Both of these still have massive headroom for improvement, but achieving that headroom is hard as the preservation gets worse.
To give numbers, for ideal portions of scrolls, we can read 100% of the characters. In nonideal portions of scrolls, we can read 0% of the characters. It's not really possible to quantify how much we could theoretically recover of that 0% through better methods, and how much is truly destroyed.
- ex-aws-dude 2 hours ago
  
  I'm curious prior to this has there been any research/attempts at chemical methods to strengthen the structure and allow it to be unrolled?
  
  1 reply →

helterskelter 1 day ago

Given the current rate of progress, how long do you think it will take to decipher the entire collection?

verditelabs 1 day ago
That's a tough one to give a strong estimate of. Some scrolls are easier or harder to unwrap and read for a multitude of different reasons, mostly due to how damaged the scroll was in the eruption, and how easy or not the ink is to read. IIRC from what we've scanned of the herculaneum collection, none of the ink is easily visible via spectrum alone, so we have to use a lot of ML and physically based rendering techniques to be able to find ink. That also requires unwrapping and segmentation _before_ any ink detection.
For iron gall ink with high enough iron concentration, the ink stands out in the xray volume through simply masking off low values, such as was shown in our campfire scroll experiment a few years ago. No herculaneum scrolls show similar ink.
- pimlottc 1 day ago
  
  Do you think this particular scroll is easier or harder to read that the others will be? Or about average?
  
  1 reply →
- superjan 1 day ago
  
  Do we known what ink is used?
  
  2 replies →
- helterskelter 1 day ago
  
  Thanks!

dogscatstrees 1 day ago

What is your origin story? How did you end up doing this and how can I do the same?

verditelabs 1 day ago
BS in CS from a big state school in the USA. I have a hobby interest in history. I learned about the challenge on YouTube. Got involved contributing because I needed money. Then they put out a job posting. I applied, interviewed, and was hired.
- ghghgfdfgh 18 hours ago
  
  I understand that the complexity of the project has increased over the years. How difficult is it for a newcomer to get into it?
  
  1 reply →
- Refreeze5224 1 day ago
  
  What a cool job, and congrats on great work!

BiraIgnacio 1 day ago

Amazing work, fantastic!

TheOtherHobbes 1 day ago

No questions, but I just want to say this is really exciting work!

msuniverse2026 1 day ago

How many more scrolls exist?

verditelabs 1 day ago
That have been dug up? I think 600 or so still exist. Perhaps about 2000 or so have ever been excavated. We have scanned about 30 of them. Still underground? I've seen various counts. Maybe more than 10000?
- quotemstr 21 hours ago
  
  Shame there's a modern city over most of Herculaneum. I'd love to excavate the remainder. Now that we can read what we find, there's a good scientific reason to do so now instead of waiting.

temp987 1 day ago

this is überragend. by many means!

jimbob45 1 day ago

Are the fragments destroyed in ‘69 and ‘80 available to be read similarly? Or were they disposed of?

verditelabs 1 day ago

I am unaware of those fragments in particular. Though we have scanned a dozen or so fragments, mostly to help guide ink detection, since the ink in them is often more visible in visible and/or near IR light, but can be hard to impossible to detect in the xray spectrum.

echelon 1 day ago

Did anyone on the team come from a non-science, non-math, non-academia background? Did anyone working on this just teach themselves and start contributing?

verditelabs 1 day ago
Yes. Sean, who was a co-winner of the 2024 prize, IIRC has no formal background in ML, computer science, AI, etc. He is one of our core researchers and the most productive team member.
- fintechjock 1 day ago
  
  I've been on the Discord for a couple of years now, and poking around with submissions as well. Sean and the entire team deserve so much praise for all of this work.
  It's easy to just read about the breakthrough and see it as one neat, linear line to get there, and hard to comprehend the hours, months and years that so many spent to get there. Big congrats to you, Sean, Nat and the entire team!
- echelon 1 day ago
  
  That's incredibly impressive.
  Major kudos to all of you on your achievements! This is amazing work for anthropology and for society, and it's greatly appreciated.

negergreger 1 day ago

How fast is the process?

Could it be automated to the point where it's faster to scan a book closed than opened?

verditelabs 1 day ago
We've been trying to automate since the beginning. A lot of it is automated but it's mostly the easier and less damaged parts of the scrolls. Scanning takes a few days for the biggest scrolls but the amount of human refinement is still a multi month process.
- fph 1 day ago
  
  Random shower thought: I wonder if it would be better in the long term to stop digging out archeological findings. The more we excavate, the more damage we do for future archaeologists who will have the superpower of reading these texts without even needing to dig the scrolls free and open them.
  
  4 replies →
- NoMoreNicksLeft 19 hours ago
  
  How many scrolls are intact (worldwide, rather than just France) that might still be recoverable?
  
  7 replies →
- itsthecourier 1 day ago
  
  may you please tell us how much effort goes into each type of task in those months?
  where else do you think these techniques be applied?
  
  1 reply →

mygooch 21 hours ago

[dead]

eboy 1 day ago

[dead]

SidewaysView 2 hours ago

[flagged]

verditelabs 2 hours ago
5/7 trolling; not bad
- SidewaysView 1 hour ago
  
  You don't like what I have to say? Fine by me. Guess we'll see.
  
  1 reply →

inglor_cz 1 day ago

I don't have any questions, just a comment.

You have a potential to rewrite the history of European Antiquity quite substantially. The Herculaneum set of scrolls is enormous and must contain a lot of hitherto unknown.

That comes with a set of peculiar risks. Once your work starts producing something that contradicts previous work of Very Important People, they will lobby to stop you. Be prepared for that.

Science should be neutral and always value new evidence. Scientists as humans are unfortunately subject to all sorts of passions.

Rebelgecko 1 day ago
What contradictions do you think the scrolls contain?
- inglor_cz 1 day ago
  
  I don't have any concrete tips.
  We have very little written material surviving from Rome, at least from the period before a codex (book) was invented, which was more durable that a scroll. Often, we only know of one source describing important events, and when it comes to political struggles and civil wars, the perspective of the defeated party often did not survive. The punishment of damnatio memoriae was practised and even among the early emperors, Caligula and Nero were subject to a form thereof. (This library in Herculaneum was buried 11 years after Nero's death.) I would be surprised if everything in the scrolls perfectly aligned with the record that survived for 2000 years and that was filtered by both random chance and political/religious censorship. Even Christians later destroyed some pagan texts.
  BTW personally, I would love for some textbook of Etruscan to emerge from there. This was once again a language whose teaching was banned in Rome.