Comment by jbarrow

4 months ago

Training ML models for PDF forms. You can try out what I’ve got so far with this service that automatically detects where fields should go and makes PDFs fillable: https://detect.semanticdocs.org/ Code and models are at: https://github.com/jbarrow/commonforms

That’s built on a dataset and paper I wrote called CommonForms, where I scraped CommonCrawl for hundreds of thousands of fillable form pages and used that as a training set:

https://arxiv.org/abs/2509.16506

Next step is training and releasing some DETRs, which I think will drive quality even higher. But the ultimate end goal is working on automatic form accessibility.

3 comments

jbarrow

abc03 4 months ago

Congratulations on being featured in the Superhuman newsletter. Trying it out.

jbarrow 4 months ago

Woah, did not realize that, haha. Let me know if it works well!