Comment by raw_anon_1111
20 days ago
No one who has any knowledge or who has ever used an LLM expects determinism.
And there are no computer professionals who haven’t heard about hallucinations.
Reviewing whether the code meets requirements through manual and automated tests - and that’s all I cared about when I had a team of 8 under me - is the same regardless. I wasn’t checking whether John used a for loop or while loop in between my customer meetings and meetings with the CTO. I definitely wasn’t checking the SOQL (not a typo) of the Salesforce consultants we hired. I was testing inputs and outputs and UX.
Having a team of 8 people producing code is manageable. Having an AI with 8 agents that write code all day long is not the same volume it can generate more code in a day that one person can review in a week. What you say is that, product teams will prompt what they want to a framework, the framework will take care of spec analysis, development, reviews, compliance with spec. Product teams with QA will make sure the delivery is functionally correct. No humans need to make sure of anything code related. What we don’t know yet is, does AI will still produce solid code trough the years because it’s all statistical analysis and with the volume of millions of loc, refactoring needed, data migrations etc what will happen ?
For context, I just started using coding agents - codex CLI and Claude code in October. Once I saw that you had to be billed by use, I’m not using my own money for it when it’s for a company.
Two things changed - Codex CLI now lets you use it with your $20 a month subscription and I have never run into quota issues with it and my employer signed up for the enterprise vs of Claude and we each have an $800 a month allowances
My argument though is “why should I care about the code?” for the most part. If I were outsourcing a project or delegating it to a team lead, I would be asking high level architectural, security and scalability questions.
AI generated the code, AI maintains the code. I am concerned about abstractions and architecture.
You shouldn’t have to maintain or refactor “millions of lines of code”, if your code is well modularized with clean interfaces, making a change for $x7 may mean making a change for $x1…$x6. But you still should be working locally in one module at the time. You should do the same for the benefit of coders. Heck my little 5 week project has three independently deployable repos in a root folder. My root Agents file just has a summary of how all three relate via a clean interface.
In the project I am working on now, besides “does it meet the requirements”, I care about security, scalability, concurrency, user experience for the end user, user experience for the operations folks when they need to make config changes, and user experience for any developers who have to make changes long after I’m off this project. I haven’t looked at a single line of code - besides the CloudFormation templates. But I can answer any architectural question about any of it. The architecture and abstractions were designed by me and dictated to the agents
On this particular project, on the coding level, there is absolutely nothing that application code like this can do that could be insecure except hypothetically embed AWS credentials into the code. But it can’t do that either since it doesn’t have access to it [1].
In this case security posture comes from the architecture - S3 block public access, well scoped IAM roles, not running “in a VPC”. Things I am checking in the infrastructure as code and I was very specific about.
The user experience has to come from design and checking manually.
I mentioned earlier that my first stab it scaled poorly. This was caused by my design and I suspected it would beforehand. But building the first version was so fast because of AI tools, I felt no pain in going with my more architecturally complicated plan B and throwing the first version away. I wouldn’t have known that by looking at the code. The code was fine it was the underlying AWS service. I could only know that by throwing 100K documents at it instead of 1000.
I designed a concurrent locking mechanism that had a subtle flaw. Throwing the code into ChatGPT into thinking mode, it immediately found it. I might have been better off just to tell the coding agents “design a locking mechanism for $x” instead of detailing it.
Even maintainability was helped because I knew I or anyone else who touched it was probably going to be using an LLM. From the get go I threw the initial contract, the discovery sessions transcripts, the design diagrams, the review of the design diagrams, my project plan and breakdown into ChatGPT and told it to render a detailed markdown file of everything - that was the beginning of my AGENTS.md file.
I asked both Codex and Claude to log everything I was doing and my decisions into separate markdown files.
Any new developer could come into my repo, fire up Claude and it wouldn’t just know what was coded, it would have full context of the project from the initial contract through to the delivery
[1] code running on AWS never explicitly has to worry about AWS credentials , the SDKs can find the information by themselves by using the credentials of the IAM role attached to the EC2 instance, Lambda, Docker container, etc.
Even locally you should be getting temporary credentials that are assigned to environment variables that the SDK retrieved automatically.
There are so many types of requirements though. Security is one, performance is another. No one has cared about while/for for a long time.
Okay - and the person ultimately leading the team is still responsibility for it whether you are delegating to more junior developers or AI. You’re still reviewing someone else’s code based on your specs