Comment by qingcharles

4 hours ago

Sadly nothing in Scott's blog post about how they obtained the source. Was it still in Microsoft's archives? Did they happen upon some tractor-feed print-outs they had to type in by hand?

It would also be interesting why it was open-sourced now. I assume if they had done the same last year, the resulting loss of revenue would not have destroyed the plucky little $3T upstart.

I assume today typing in by hand is no longer needed, with text parsing from images being table stakes for LLMs.

  • You don’t need an llm to do this.

    • Of course, but I do know that it wasn't that long ago that OCR still wasn't great for many documents. These days LLMs can tackle it all.