Comment by hallole
20 hours ago
Thanks for this. Really quells the urge I get every so often to just code my own PDF editor, because they all suck and certainly it couldn't be THAT hard. Such hubris!
20 hours ago
Thanks for this. Really quells the urge I get every so often to just code my own PDF editor, because they all suck and certainly it couldn't be THAT hard. Such hubris!
Heh, have at it, here's the full spec: https://developer.adobe.com/document-services/docs/assets/5b...
Should take... a weekend tops? ;) PDF is crazy and scary
> PDF includes eight basic types of objects: Boolean values, Integer and Real numbers, Strings, Names, Arrays, Dictionaries, Streams, and the null object
Wait, this is more complete than SOAP. It may be a good idea to redo the IPC protocol with a different serialization format!
Well, it's a descendant of Postscript (much like JSON is a descendant of Javascript, loosely)
Society would probably never recover if we started implementing RPC-in-Postscript though.
7.5.6 "Incremental updates" from the specification is an interesting section too, speaking about accessing data people didn't think to remove from PDF files properly.
We will be able to say that AGI has arrived when we can hand that spec off to a model and tell it to build an Acrobat clone.
We will be able to say that AGI has arrived when the AI hands it back and says "no".
1 reply →
Don't stop yourself before getting started. I believe in you - maybe you could write the one editor that would actually work!
Not kidding - it's a ~~~billion dollar market haha
Make an MVP/Show HN :-)
I did a bunch of work creating pdfs using a low-level API, object goes here stuff.
As far as I understand it, at its core, pdf is just a stream of instructions that is continually modifying the document. You can insert a thousand objects before you start the next word in a paragraph. And this is just the most basic stuff. Anything on a page can be anywhere in the stream. I don't know if you can go back and edit previous pages, you might have a shot at least trying to understand one page at a time.
Did you know you can have embedded XML in PDFs? You can have a paper form with all the data filled in and include an XML version of that for any computer systems that would like an easier way to read it.
The blog post about adding colour gradients to Typst dives into some of the weirdness of the format. https://typst.app/blog/2023/color-gradients
Bravo to you for recognising the load-bearing 'just' before you threw it around :)