Comment by matthewshere
4 days ago
I'm currently trying to transition from a fairly rigid day job into working as an independent developer. The goal is to build useful online tools and hopefully create a sustainable income stream doing something I find more engaging.
One consistent annoyance in my professional work has been dealing with PDFs – specifically, extracting information into editable formats without losing structure. Copy-pasting often creates a mess.
So, my first project tackling this is an online PDF to Markdown converter: https://pdftomarkdown.pro/
I've focused heavily on trying to maintain good formatting for headings, text flow, formulas, and especially table structure (getting rows/columns right in Markdown). It also has an online editor for quick modifications after conversion.
A key aspect for me was privacy: the application explicitly does not save the content of uploaded PDFs or the generated Markdown files. It only stores minimal metadata (email, filename, page count) for registered users' plan limits.
It's very much a "scratching my own itch" project born out of that PDF frustration. Early days, but hoping it proves useful for others too.
Hey! You sound like an interesting person and I'd like to learn more about your path of transition from employee to an entrepreneur.
I couldn't find a way to contact you, so if you feel like it, drop me an email (email on my website in profile).
I've often wanted a bulk tool that takes the title or some other easy to find value from a pdf and renames the file to that.
Appreciate you sharing that requirement!
The need for batch processing to pull out targeted data points from PDFs (rather than converting the whole document) is a valuable insight.
While the current tool focuses on full conversion to Markdown, enhancing https://pdftomarkdown.pro/ to handle specific data extraction tasks like yours is definitely something I'll consider carefully for the future roadmap. Thanks for highlighting it!
Unfortunately, PDFs are right buggers to work with and there often isn't an "easy to find value" for anything
You're absolutely right, PDFs can be incredibly tricky. That lack of a consistent, easily parsable structure for arbitrary data is the core challenge.
I mean easy in the PDF sense. I have folders full of randomString.pdf and name(15).pdf but those that share a folder all have the same layout.