← Back to context

Comment by davecyen

16 hours ago

Very cool - is it possible to simulate this on a live production site (i.e. instead of Halluminate Flights, just test the agent live on Expedia)? Even though you don't have access to the backend json, presumably you could verify the right values were entered in the frontend/UI?

yup, though without access to the code it's much harder to pull the state of the components - becomes more like a web scraping problem, it's a brittle and much hackier than just intentionally exposing component state like we can do in the sim.

more importantly though are use cases that depend on the data. the data on real google flights/expedia is constantly changing, so it's impossible to build datasets based ground truth, e.g. the answer for a task like "Find the cheapest round-trip flight option from Bologna (BLQ) to Dushanbe (DYU) if I leave on 2026-05-05 and come back on 2026-05-15. Return the total price and the flight numbers for all flights." isn't stable. on our site, we control the data, so that answer is stable (deterministically random). so controlling the whole clone rather than running on the prod site unlocks richer and more repeatable tasks/testing.

lastly, our site runs the exact same locally as deployed, it has zero internet dependencies. so it can be run offline directly on the cluster with no issue for network latency/failures