Comment by IshKebab

2 months ago

I never claimed to be doing a formal proof. If someone said "traffic was bad this morning" would you say "have you done a scientific study on the average journey times across the year and for different locations to know that it was actually bad"?

> LLMs produce worse results in than humans

We aren't talking about whether LLMs are better than humans.

Also we're obviously talking about Rust code that compiles. Code that doesn't compile is 100% secure!

1 comment

IshKebab

ux266478 2 months ago

I didn't claim that you were doing formal proofs. You can still make bad rhetoric, formal or not. You can say "The sky is blue, therefore C is a memory safe language" and that's trivially inferred to be faulty reasoning. For many people bad deduction is easier to pick up on than bad induction, but they're both rhetorically catastrophic. You are making a similarly unsound conclusion to the ridiculous example above, its not a valid statistical generalization. Formal or not the rhetoric is faulty.

> would you say "have you done a scientific study on the average journey times across the year and for different locations to know that it was actually bad"?

In response to a similarly suspiciously faulty inductive claim? Yeah, absolutely.

> We aren't talking about whether LLMs are better than humans.

The point I'm making here is specifically in response to the idea that it would "be surprising" if LLMs produced substantially worse code in Rust than they did in C. The paper I posted is merely a touch point to demonstrate substantial deviation in results in an adjacent context. Rust has lower surface area to make certain classes of vulns under certain conditions, but that's not isomorphic with the kind of behavior LLMs exhibit. We don't have:

- Guarantees LLMs will restrict themselves to operating in safe Rust

- Guarantees these specific vulnerabilities are statistically significant in comparative LLM output

- The vulnerability severity will be lower in Rust

Where I think you might be misunderstanding me is that this isn't a statement of empirical epistemological negativism. I'm underlining that this context is way too complex to be attempting prediction. I think it should be studied, and I hope that it's the case LLMs can write good, high quality safe Rust reliably. But specifically advocating for it on gut assumptions? No. We are advocating for safety here.

Because of how chaotic this context is, we can't reasonably assume anything here without explicit data to back it up. It's no better than trying to predict the weather based on your gut. Hence, why I asked for specific data to back the claim up. Even safe Rust isn't safe from security vulnerabilities stemming from architectural inadequacies and panics. It very well may be the case that in reasonably comparable contexts, LLMs produce security vulnerabilities in real Rust codebases at the same rate they create similar vulnerabilities in C. It might also be the case that they produce low severity issues in C at a similar statistical rate as high severity issues in Rust. For instance, buffer overflows manifesting in 30% of sample C codebase resulting in unexploitable segfaults, vs architectural deficiencies in a safe Rust codebase, manifesting in 30% of cases, that allow exfiltration of everything in your databases without RCE. Under these conditions, I don't think it's reasonable to say Rust is a better choice.

Again, it's not a critique in some epistemological negativist sense. It's a critique that you are underestimating how chaotic this context actually is, and the knock-on effects of that. Nothing should surprise you.