Comment by maxothex

21 hours ago

What I'm most curious about is how this translates to messy, real-world codebases without well-defined metrics. Most production software isn't chip design or kernel optimization - it's business logic with unclear success criteria. The infrastructure story is impressive, but I'd love to see how they handle domains where the evaluation function itself is ambiguous.