Comment by tjsk
8 hours ago
Did you consider working around those using the vision models vs DOM parsing? Was cost/latency the bottleneck? Seems like the agentic future you describe would need more vision based parsing
8 hours ago
Did you consider working around those using the vision models vs DOM parsing? Was cost/latency the bottleneck? Seems like the agentic future you describe would need more vision based parsing
I believe we will at some point. All question of the right need coming up. Text OCR has gotten really good, and if you think of it from a UI perspective, the only real contract is that a screen will show text that's representative of the information entered. The DOM is useful but is a changeable contract!