Comment by ajcp

5 months ago

Not sure what service you're basing your calculation on but with Gemmini I've processed 10,000,000+ shipping documents (PDF and PNGs) of every concievable layout in one month at under $1000 and an accuracy rate of between 80-82% (humans were at 66%).

The longest part of the development timeline was establishing the accuracy rate and the ingestion pipeline, which itself is massively less complex than what your workflow sounds like: PDF -> Storage Bucket -> Gemini -> JSON response -> Database

Just to get sick with it we actually added some recusion to the Gemini step to have it rate how well it extracted, and if it was below a certain rating to rewrite its own instructions on how to extract the information and then feed it back into itself. We didn't see any improvement in accuracy, but it was still fun to do.

9 comments

ajcp

llm_trw 5 months ago

>Not sure what service you're basing your calculation on but with Gemmini

The table of costs in the blog post. At 500,000 pages per day the hardware fixed cost overcomes the software variable cost at day 240 and from then on you're paying an extra ~$100 per day to keep it running in the cloud. The machine also had to use extremely beefy GPUs to fit all the models it needed to. Compute utilization was between 5 to 10% which means that it's future proof for the next 5 years at the rate at which the data source was growing.

    | Model                       | Pages per Dollar |
    |-----------------------------+------------------|
    | Gemini 2.0 Flash            | ≈ 6,000          |
    | Gemini 2.0 Flash Lite       | ≈ 12,000*        |
    | Gemini 1.5 Flash            | ≈ 10,000         |
    | AWS Textract                | ≈ 1,000          |
    | Gemini 1.5 Pro              | ≈ 700            |
    | OpenAI 4o-mini              | ≈ 450            |
    | LlamaParse                  | ≈ 300            |
    | OpenAI 4o                   | ≈ 200            |
    | Anthropic claude-3-5-sonnet | ≈ 100            |
    | Reducto                     | ≈ 100            |
    | Chunkr                      | ≈ 100            |

There is also the fact that it's _completely_ local. Which meant we could throw in every proprietary data source that couldn't leave the company at it.

>The longest part of the development timeline was establishing the accuracy rate and the ingestion pipeline, which itself is massively less complex than what your workflow sounds like: PDF -> Storage Bucket -> Gemini -> JSON response -> Database

Each company should build tools which match the skill level of their developers. If you're not comfortable training models locally with all that entails off the shelf solutions allow companies to punch way above their weight class in their industry.

serjester 5 months ago
That assumes that you're able to find a model that can match Gemini's performance - I haven't come across anything that comes close (although hopefully that changes).
- jeswin 5 months ago
  
  Nice article, mirrors my experience. Last year (around when multimodal 3.5 Sonnet launched), I had run a sizeable number of PDFs through it. Accuracy was remarkably high (99%-ish), whereas GPT was just unusable for this purpose.

cpursley 5 months ago

Very cool! How are you storing it to a database - vectors? What do you do with the extracted data (in terms of being able to pull it up via some query system)?

ajcp 5 months ago
In this use-case the customer just wanted data not currently in the warehouse inventory management system capatured, so here we converted a JSON response to a classic table row schema (where 1 row = 1 document) and now boom, shipping data!
However we do very much recommend storing the raw model responses for audit and then at least as vector embeddings to orient the expectation that the data will need to be utilized for vector search and RAG. Kind of like "while we're here why don't we do what you're going to want to do at some point, even if it's not your use-case now..."
- rofl123 5 months ago
  
  > Kind of like "while we're here why don't we do what you're going to want to do at some point, even if it's not your use-case now..."
  wow, this is so bad. why do it now and introduce complexity and debt if you can do it later when you actually need it? you are just riding the hype wave and trying to get most out of it but that's fine.
  
  1 reply →

svieira 5 months ago

> [with] an accuracy rate of between 80-82% (humans were at 66%)

Was this human-verified in some way? If not, how did you establish the facts-on-the-ground about accuracy?

ajcp 5 months ago

Yup, unfortunately the only way to know how good an AI is at anything is to do the same way you'd do with a human: build a test that you know the answers to already. That's also why the accuracy evaluation was by far the most time intensive part of the development pipeline as we had to manually build a "ground-truth" dataset that we could evaluate the AI again.