Comment by mort96

1 day ago

We do not need vibe-coded critical infrastructure.

31 comments

mort96

As I see it, the focus should not be about the coding, but about the testing, and particularly the security evaluation. Particularly for critical infrastructure, I would want us to have a testing approach that is so reliable that it wouldn't matter who/what wrote the code.

bawolff 1 day ago
I dont think that will ever be possible.
At some point security becomes - the program does the thing the human wanted it to do but didn't realize they didn't actually want.
No amount of testing can fix logic bugs due to bad specification.
- skrtskrt 21 hours ago
  
  AI as advanced fuzz-testing is ridiculously helpful though - hardly any bug you can in this sort of advanced system is a specification logic bug. It's low-level security-based stuff, finding ways to DDOS a local process, or work around OS-level security restrictions, etc.
  
  2 replies →
- falcor84 1 day ago
  
  Well, yes, agreed - that is the essential domain complexity.
  But my argument is that we can work to minimize the time we spend on verifying the code-level accidental complexity.
  
  1 reply →
jbvlkt 17 hours ago

I have been thinking about that lately and isn't testing and security evaluation way harder problem than designing and carefully implementing new features? I think that vibecoding automates easiest step in SW development while making more challenging/expensive steps harder. How are we suppose to debug complex problems in critical infrastructure if no one understands code? It is possible that in future agents will be able to do that but it feels to me that we are not there yet.
mort96 1 day ago
I disagree. Thorough testing provides some level of confidence that the code is correct, but there's immense value in having infrastructure which some people understand because they wrote it. No amount of process around your vibe slop can provide that.
- px43 1 day ago
  
  That's just status quo, which isn't really holding up in the modern era IMO.
  I'm sure we'll have vibed infrastructure and slow infrastructure, and one of them will burn down more frequently. Only time will tell who survives the onslaught and who gets dropped, but I personally won't be making any bets on slow infrastructure.
- falcor84 1 day ago
  
  I somewhat agree, but even then would argue that the proper level at which this understanding should reside is at the architecture and data flow invariants levels, rather than the code itself. And these can actually be enforced quite well as tests against human-authored diagrammatical specs.
  
  3 replies →
- irishcoffee 12 hours ago
  
  Who is writing the tests? An LLM? If so, they have little value.

rl3 15 hours ago

>> ...give him unlimited model access

>We do not need vibe-coded critical infrastructure.

I think when you have virtually unlimited compute, it affords the ability to really lock down test writing and code review to a degree that isn't possible with normal vibe code setups and budgets.

That said for truly critical things, I could see a final human review step for a given piece of generated code, followed by a hard lock. That workflow is going to be popular if it already isn't.

mort96 15 hours ago
The availability or lack thereof of compute has absolutely nothing to do with my opinion. More vibe coded tests doesn't fix the problem.
- rl3 15 hours ago
  
  It might when an individual function has 50 different models reviewing it, potentially multiple times each.
  Perhaps part of a complex review chain for said function that's a few hundred LLM invocations total.
  So long as there's a human reviewing it at the end and it gets locked, I'd argue it ultimately doesn't matter how the code was initially created.
  There's a lot of reasons it would matter before it gets to that point, just more to do with system design concerns. Of course, you could also argue safety is an ongoing process that partially derives from system design and you wouldn't be wrong.
  It occurred to me there's some recent prior art here:
  https://news.ycombinator.com/item?id=47721953
  It's probably fair to say the Linux kernel is critical infra, or at least a component piece in a lot of it.
  
  3 replies →

rafaelmn 1 day ago

If you're trusting core contributors without AI I don't see why you wouldn't trust them with it.

Hiring a few core devs to work on it should be a rounding error to Anthropic and a huge flex if they are actually able to deliver.

mort96 21 hours ago
I trust people to understand the code they write. I don't trust them to understand code they didn't write.
- weregiraffe 8 hours ago
  
  So you don't trust projects with more than one author? By definition, they'd have to understand each other's code.
  
  1 reply →
t43562 21 hours ago

It's extremely tempting to write stuff and not bother to understand it similar to the way most of us don't decompile our binaries and look at the assembler when we write C/C++.
So, should I trust an LLM as much as a C compiler?
jddj 19 hours ago

What if it impairs judgement?

andai 20 hours ago

They're getting really good at proofs and theorems, right?

IshKebab 18 hours ago

Proofs/theorems and memory safety vulnerabilities are a special case because there's an easy way to verify whether the model is bullshitting or not.
That's not true for coding in general. The best you can do is having unreasonably good test coverage, but the vast majority of code doesn't have that.

scrame 1 day ago

Unfortunately we're going to get it whether or not we need it.

teaearlgraycold 20 hours ago

Well if the big players want to tell me their models are nearly AGI they need to put up or shut up. I don't want a stochastically downloaded C compiler. I want tech that improves something.