← Back to context

Comment by abalone

6 days ago

I like to make a rough analogy with autonomous vehicles. There's a leveling system from 1 (old school cruise control) to 5 (full automation):

* We achieved Level 2 autonomy first, which requires you to fully supervise and retain control of the vehicle and expect mistakes at any moment. So kind of neat but also can get you in big trouble if you don't supervise properly. Some people like it, some people don't see it as a net gain given the oversight required.

^ This is where Tesla "FSD beta" is at, and probably where LLM codegen tools are at today.

* After many years we have achieved a degree of Level 4 autonomy on well-trained routes albeit with occasional human intervention. This is where Waymo is at in certain cities. Level 4 means autonomy within specific but broad circumstances like a given area and weather conditions. While it is still somewhat early days it looks like we can generally trust these to operate safely and ask for help when they are not confident. Humans are not out of the loop.[1]

^ This is probably what where we can expect codegen to grow after many more years of training and refinement in specific domains. I.e. a lot of what CloudFlare engineers did with their prompt engineering tweaking was of this nature. Think of them as the employees driving the training vehicles around San Francisco for the past decade. And similarly, "L4 codegen" needs to prioritize code safety which in part means ensuring humans can understand situations and step in to guide and debug when the tool gets stuck.

* We are still nowhere close to Level 5 "drive anywhere and under any conditions a human can." And IMHO it's not clear we ever will based purely on the technology and methods that got us to L4. There are other brain mechanisms at work that need to be modeled.

[1] https://www.cnbc.com/2023/11/06/cruise-confirms-robotaxis-re...

That's a good analogy. OAuth libraries and integrations are like a highly-mapped California city. Just because you can drive a Waymo or coding agent there, doesn't mean you can drive it through the Rockies.

  • Note that even with OAuth it took, as of today, security engineers many iterations of review and prompt tweaking to get this result. We’re still in the “mapping” phase.