Comment by tjsk
12 hours ago
Did you consider working around those using the vision models vs DOM parsing? Was cost/latency the bottleneck? Seems like the agentic future you describe would need more vision based parsing
12 hours ago
Did you consider working around those using the vision models vs DOM parsing? Was cost/latency the bottleneck? Seems like the agentic future you describe would need more vision based parsing
I believe we will at some point. All question of the right need coming up. Text OCR has gotten really good, and if you think of it from a UI perspective, the only real contract is that a screen will show text that's representative of the information entered. The DOM is useful but is a changeable contract!