Comment by fsloth
6 days ago
By using the program? Mind you this works only for _personal_ tools where it’s intuitively obvious when something is wrong.
For example
”Please create a viewer for geojson where i can select individual feature polygons and then have button ’export’ that exports the selected features to a new geojson”
1. You run it 2. It shows the json and visualizes selections 3. The exported subset looks good
I have no idea how anyone could keep the callgraph of even a minimal gui application in their head. If you can then congratulations, not all of us can!
Great, I used my program and everything seems to be working as expected.
Not great, somebody else used my program and they got root on my server...
Practice.
Lots and lots of practice.
Write it down. Do things the hard way. Build the diagrams by hand and make sure you know what's going on. Trace programs. Pull out the debugger! Pull out the profiler!
If you do those things, you too will gain that skill. Obviously you can't do this for a giant program but it is all about the resolution of your call graph anyways.
If you are junior, this is the most important time to put in that work. You will get far more from it than you lose. If you're further along, well the second best time to plant a tree is today.
”not great, somebody else used my program and they got root on my server...”
In general security sensitive software is the worst place possible to use LLM:s based on public case studies and anecdata exactly for this reason.
”Do it the hard way”
Yes that’s generally the way I do it as well when I need to reliably understand something but it takes hours.
The cadence with LLM driven experiments is usually under an hour. That’s the biggest boom for me - I get a new tool and can focus on the actual work I’m delivering, with some step now taking slightly less time.
For example I’m happy using vim without ever having read the code or debugged it, much less having observed it’s callgraph. I’m similarly content in using LLM generated utilities without much oversight. I would never push code like that to production of course.