Comment by m3kw9

1 year ago

You don’t really feed images to LLMs, rather to a vision model within the multi modal llm

1 comment

m3kw9

yup, important clarification! the language portion of the model also works with the extraction however, and is prone to the hallucinations