Comment by 2-3-7-43-1807 15 days ago i dont understand. what have llms to do with ocr? 5 comments 2-3-7-43-1807 Reply esafak 15 days ago Some like gpt-4o are multi-modal. 2-3-7-43-1807 15 days ago the llm isn't multimodal. an llm can only process textual tokens. what should those tokens be for pictures. the llm gets fed a textual representation of what was optically recognized by another process. that's my understanding. esafak 15 days ago gpt-4o is multimodal. The o in it stands for omni.https://news.ycombinator.com/item?id=40608269 2 replies →
esafak 15 days ago Some like gpt-4o are multi-modal. 2-3-7-43-1807 15 days ago the llm isn't multimodal. an llm can only process textual tokens. what should those tokens be for pictures. the llm gets fed a textual representation of what was optically recognized by another process. that's my understanding. esafak 15 days ago gpt-4o is multimodal. The o in it stands for omni.https://news.ycombinator.com/item?id=40608269 2 replies →
2-3-7-43-1807 15 days ago the llm isn't multimodal. an llm can only process textual tokens. what should those tokens be for pictures. the llm gets fed a textual representation of what was optically recognized by another process. that's my understanding. esafak 15 days ago gpt-4o is multimodal. The o in it stands for omni.https://news.ycombinator.com/item?id=40608269 2 replies →
esafak 15 days ago gpt-4o is multimodal. The o in it stands for omni.https://news.ycombinator.com/item?id=40608269 2 replies →
Some like gpt-4o are multi-modal.
the llm isn't multimodal. an llm can only process textual tokens. what should those tokens be for pictures. the llm gets fed a textual representation of what was optically recognized by another process. that's my understanding.
gpt-4o is multimodal. The o in it stands for omni.
https://news.ycombinator.com/item?id=40608269
2 replies →