Comment by llm_trw

5 months ago

>Not sure what service you're basing your calculation on but with Gemmini

The table of costs in the blog post. At 500,000 pages per day the hardware fixed cost overcomes the software variable cost at day 240 and from then on you're paying an extra ~$100 per day to keep it running in the cloud. The machine also had to use extremely beefy GPUs to fit all the models it needed to. Compute utilization was between 5 to 10% which means that it's future proof for the next 5 years at the rate at which the data source was growing.

    | Model                       | Pages per Dollar |
    |-----------------------------+------------------|
    | Gemini 2.0 Flash            | ≈ 6,000          |
    | Gemini 2.0 Flash Lite       | ≈ 12,000*        |
    | Gemini 1.5 Flash            | ≈ 10,000         |
    | AWS Textract                | ≈ 1,000          |
    | Gemini 1.5 Pro              | ≈ 700            |
    | OpenAI 4o-mini              | ≈ 450            |
    | LlamaParse                  | ≈ 300            |
    | OpenAI 4o                   | ≈ 200            |
    | Anthropic claude-3-5-sonnet | ≈ 100            |
    | Reducto                     | ≈ 100            |
    | Chunkr                      | ≈ 100            |

There is also the fact that it's _completely_ local. Which meant we could throw in every proprietary data source that couldn't leave the company at it.

>The longest part of the development timeline was establishing the accuracy rate and the ingestion pipeline, which itself is massively less complex than what your workflow sounds like: PDF -> Storage Bucket -> Gemini -> JSON response -> Database

Each company should build tools which match the skill level of their developers. If you're not comfortable training models locally with all that entails off the shelf solutions allow companies to punch way above their weight class in their industry.

2 comments

llm_trw

serjester 5 months ago

That assumes that you're able to find a model that can match Gemini's performance - I haven't come across anything that comes close (although hopefully that changes).

jeswin 5 months ago

Nice article, mirrors my experience. Last year (around when multimodal 3.5 Sonnet launched), I had run a sizeable number of PDFs through it. Accuracy was remarkably high (99%-ish), whereas GPT was just unusable for this purpose.