← Back to context Comment by m3kw9 15 days ago You don’t really feed images to LLMs, rather to a vision model within the multi modal llm 1 comment m3kw9 Reply ritvikpandey21 15 days ago yup, important clarification! the language portion of the model also works with the extraction however, and is prone to the hallucinations
ritvikpandey21 15 days ago yup, important clarification! the language portion of the model also works with the extraction however, and is prone to the hallucinations
yup, important clarification! the language portion of the model also works with the extraction however, and is prone to the hallucinations