Comment by onlyrealcuzzo
1 day ago
#1 -> part of scaling is you can't review every single line of code.
LLMs don't really scale if you're still the bottlneck, or they only scale as much as you reviewing every line of code - that's not that much scaling...
So I try to only review certain parts, like making sure they aren't changing tests to allow architecturally broken code to slip through (because they regularly try, even when given explicit instructions not to). Or if I'm watching them make changes on my phone and see that they are clearly doing the exact opposite of what they're supposed to be doing (regularly if I'm watching).
#2 -> if commits are small, GitHub's setup is good enough that you can review code on your phone.
#3 -> if they're huge, I can just review on my laptop at lunch or something.
Theoretically, all of this can be solved easily with orchestration and require minimal oversight.
If you're using LLMs to write code and you're carefully reviewing every line with a jade-handled magnifying glass, you're not really scaling - at least to the degree I'm interested in.
> LLMs don't really scale if you're still the bottlneck
This only works if there's no consequences if your code breaks. In the eyes of other humans you're responsible for what you commit. No amount of "scaling" will change that.
> This only works if there's no consequences if your code breaks. In the eyes of other humans you're responsible for what you commit. No amount of "scaling" will change that.
You're only responsible for what you merge to master, not everything you commit to a feature branch no one is looking at...
If you have the testing and the infrastructure in place such that you can't ship broken code, then you just need to make sure your invariants are upheld - not that every single line is beautiful and perfect.
Further, I am working on a set of metrics that seems pretty good at identifying sloppy architecture. There's decent prior art at many different components of what "sloppy" architecture actually is, and ways to visualize it.
If you can rely on the consensus of several different models, plus your own judgement with the design and the testing in place to verify its implemented correct...
Then, 1) you don't need to code. 2) you only need to review 1/10th or less of the code written. That scales. Reading every line of code line by line doesn't really scale. LLMs aren't very fast at implementation outside of green-field projects. So you can often times implement something faster than they will if you did it by hand. Reviewing can take just as much time as implementing...
So you're making a programming language... and you don't want to read code. Have I got the gist of it?
2 replies →