Comment by rahimnathwani

9 months ago

Hi Jerry,

How well does llamaparse work on foreign-language documents?

I have pipeline for Arabic-language docs using Azure for OCR and GPT-4o-mini to extract structured information. Would it be worth trying llamaparse to replace part of the pipeline or the whole thing?

yes! we have foreign language support for better OCR on scans. Here's some more details. Docs: https://docs.cloud.llamaindex.ai/llamaparse/features/parsing... Notebook: https://github.com/run-llama/llama_parse/blob/main/examples/...

  • What is disable_ocr=True for? Is it for documents that already have a text layer, that you don't want to OCR again?

    • yeah disable OCR is for documents where you don't need to OCR a scanned image, it'll just parse out the text

      it's faster if True