Comment by jugg1es
7 hours ago
I think veteran engineers have always known that the real problems with velocity have always been more organizational than technical. The inability for the business to define a focused, productive roadmap has always been the problem in software engineering. Constantly jumping to the next shiny thing that yields almost no ROI but never allowing systemic tech debt to be addressed has crippled many company's I have worked at in the long-term.
> The inability for the business to define a focused, productive roadmap has always been the problem in software engineering.
Agreed, and I also agree that most developers come to this realization with time and experience. When you have a clear understanding of business rationale, scope, inputs, and desired outputs, the data models, system design and the code fall out almost naturally. Or at least are much more obvious.
- systemic tech debt is now addressable at scale with LLMs. Future models will be good enough to sustain this, if people don’t believe this I would challenge them to explain why. First consider if you understand what scaling laws are like chinchilla and how RL with verification works fundamentally
- I completely agree with you about fundamentally the limitation being the business able to coherently articulate itself and its strategy
- BUT the benefit now is you can basically prototype for free. Before we had to be extremely careful with engineer headcount investment. Now we can try many more things under the same time constraints.
> BUT the benefit now is you can basically prototype for free.
But.. so can your competitors. And that changes the value proposition.
How do you mean?
> systemic tech debt is now addressable at scale with LLMs.
Is there any reason to believe this? I've only seen the evidence of the contrary so far.
My experience with AI coding aides is that they, generally:
1. Don't have an opinion.
2. Are trained on code written using practices that increase technical debt.
3. Lack in the greater perspective department, more focused on concrete, superficial and immediate.
I think, I need to elaborate on the first and explain how it's relevant to the question. I'll start with an example. We have an AI reviewer and recently had migrated a bunch of company's repositories from Bitbucket to GitLab. This also prompted a bunch of CI changes. Some projects I'm involved with, but don't have much of an authority, that are written in Python switched to complicated builds that involve pyproject.toml (often including dynamic generation of this cursed file) as well as integration with a bunch of novelty (but poor quality) Python infrastructure tools that are used for building Python distributalbe artifacts.
In the projects where I have an authority, I removed most of the third-party integration. None of them use pyproject.toml or setup.cfg or any similar configuration for the third-party build tool. The project code contains bespoke code to build the artifacts.
These two approaches are clearly at odds. A living and breathing person would either believe one to be the right approach or the other. The AI reviewer had no problems with this situation. It made some pedantic comments about the style and some fantasy-impossible-error-cases, but completely ignored the fact that moving forward these two approaches are bound to collide. While it appears to have an opinion about the style of quotation marks, it completely doesn't care about strategic decisions.
My guess as to why this is the case is that such situations are genuinely rarely addressed in code review. Most productive PRs, from which an AI could learn, are designed around small well-defined features in the pre-agreed upon context. The context is never discussed in PRs because it's impractical (it would usually require too much of a change, so the developers don't even bring up the issue).
And this is where real large glacier-style deposits of tech debt live. It's the issues developers are afraid of mentioning because of the understanding that they will never be given authority and resources to deal with.
You are not wrong about anything you’re saying but like I said this misses the forest for the trees. I’m talking about like the next ~2 years. There is a common idea that we don’t understand this technology or what will happen performance wise. We know a lot more about what’s going to happen than people think. It’s because none of this is new. We’ve known about neural nets since the 40s, we know how RL works on a fundamental level and it has been an active and beautiful field of research for at least 30-40 years, we know what happens when you combine RL with verifiable rewards and throw a lot of compute at it.
One big misconception is that these models are trained to mimic humans and are limited by the quality of the human training data, and this is not true and also basically almost entirely the reason why you have so much bullishness and premature adoption of agentic coding tools.
Coding agents use human traces as a starting point. You technically don’t have to do this at all but that’s an academic point, you can’t do it practically (today). The early training stages with human traces (and also verified synthetic traces from your last model) get you to a point where RL is stable and efficient and push you the rest of the way. It’s synthetic data that really powers this and it’s rejection sampling; you generate a bunch of traces, figure out which ones pass the verification, and keep those as training examples.
So because
- we know how this works on a fundamental level and have for some time
- human training data is a bootstrap it’s not a limitation fundamentally
- you are absolutely right about your observations yet look at where you are today and look at say Claude sonnet 3.x. It’s an entire world away in like a year
- we have imperfect benchmarks all with various weaknesses yet all of them telling the same compelling story. Plus you have adoption numbers and walled garden data that is the proof in the pudding
The onus is on people who say “this is plateauing” or “this has some fundamental limitation that we will not get past fairly quickly”.
> [O]rganizations which design systems (in the broad sense used here) are constrained to produce designs which are copies of the communication structures of these organizations.
— Melvin E. Conway 1967
Any competent engineer should understand that engineering is just the assembly line side of product development. Deciding when to release which feature, bug fixes, etc. and the development/management of the product in general has always been the real challenge, and a lot of the strategy involved in doing this relies on feedback loops that AI cannot speed up. Though at the same time I do feel like leaders on the business side often scapegoat engineer's speed as an excuse instead of taking responsibility for poor decisions on their end.
I get what youre trying to say but this is actually a bad picture to defend. product and engineering should go hand in hand, with one side informing the other. Engineers sctually giving a shit about a product will tell product possibilities they havent even considered, product people caring about engineering will not propose utterly stupid things. and I for one can spot when a product is well designed but poorly made, as well as when a product is perfectly crafted yet useless. the sweetspot is both. and even with the speed multiplier of AI, having a proud in the craft and being actually good in it as an engineer makes a night and day difference for the final result.
> I think veteran engineers have always known that the real problems with velocity have always been more organizational than technical.
I don't think this comment is fair or grounded. There are plenty of process bottlenecks that are created by developers. Unfortunately I have a hefty share of war stories where a tech lead's inability to draft a coherent and clear design resulted in project delays and systems riddled with accidental complexity required to patch the solution enough to work.
Developers are a part of the process and they are participants of both the good parts and the bad parts. If business requirements are not clear, it's the developer's job to work with product owners to arrive at said clarity.
> Unfortunately I have a hefty share of war stories where a tech lead's inability to draft a coherent and clear design resulted in project delays and systems riddled with accidental complexity required to patch the solution enough to work
This is also an organizational problem (bad hiring/personal management). If you put an incompetent individual at the helm of a project, then resources (especially time) will be spent horrendously and you will have more problems down the line. That’s true for all type of organizations and projects.
yes, most places I have worked were hobbled by the organizations being completely idiotic.
which is why engineers want to be left alone to code, historically. Better to be left alone than dealing with insane bureaucracy. But even better than that is working with good bureaucracy. Just, once you know it's insane, there's not really anything that you can personally do about it, so you check out and try to hold onto a semblance of sanity in the realm you have control over, which is the code.
> there's not really anything that you can personally do about it
Small companies/startups don't have insane bureaucracy, and they're hiring.
And now they're almost forcing us to produce machine-made tech-debt at an industrial scale. The AI craze isn't going to produce the boon some people think it will. And the solution? More AI, unfortunately.
> And the solution? More AI, unfortunately.
I think the solution to using AI in coding is more testing, which unlocks even more AI.
The solution truly is more AI, yes.
> AI craze isn't going to produce the boon some people think it will.
What’s the boon you don’t think it will produce?
No. It's not more AI. The solution is designing and sticking to development process that is more resilient to errors than the one that's currently happening. This isn't a novel idea. Code reviews weren't always part of the process, neither was VCS, nor bug tracker etc.
The way AI is set up today, it's trying to replicate the (hopefully) good existing practices. Possibly faster. The real change comes from inventing better practices (something AI isn't capable of, at least not the kind of AI that's being sold to the programmers today).
1 reply →
It’s part of the problem but AI also can crush this on pure lines of code and functionality alone. It can put out 100,000 lines of somewhat decent code in a day. That usually takes months or years of manual coding for a team.
There is a reason that kLOC / FP were rightly shunned out of being measurable metrics years ago. The same clown show seems to be resurging with "tokens". There is, in my opinion, no real formula or metric that you can define for "good" code or "bad" code. Tickets and ceremonial activities, however abstract that into a N-nary status value that seems easier to judge upon.
More lines of code doesn’t help adding more constraints to a system without violating the existing ones.
In fact, it makes it harder.