Comment by wahern

10 hours ago

It should be much easier than that. You should should be able to serially test if each edit decodes to a sane PDF structure, reducing the cost similar to how you can crack passwords when the server doesn't use a constant-time memcmp. Are PDFs typically compressed by default? If so that makes it even easier given built-in checksums. But it's just not something you can do by throwing data at existing tools. You'll need to build a testing harness with instrumentation deep in the bowels of the decoders. This kind of work is the polar opposite of what AI code generators or naive scripting can accomplish.

2 comments

wahern

pimlottc 8 hours ago

I wonder if you could leverage some of the fuzzing frameworks tools like Jepsen rely on. I’m sure there’s got to be one for PDF generation.

cluckindan 9 hours ago

On the contrary, that kind of one-off tooling seems a great fit for AI. Just specify the desired inputs, outputs and behavior as accurately as possible.