Comment by sandeepkd

2 days ago

Somehow this reminded me of the historical efforts of some government bounty collections for mouse tails which were discontinued due to fraud (such as hunters breeding mice to collect the reward). There is a reason why/how devs and QA keep each other in check. Guess in case of LLM writing code, one has to use different models for dev and security checks.

On other hand, in real world, the developers learn from mistakes and avoid them in the future. However there is no feedback loop with enterprises using LLM with the agreement that the LLM would not use the enterprise code for training purposes

> the developers learn from mistakes and avoid them in the future

No. Humans learn from mistakes and try to avoid them in the future, but there is a whole pile of other stuff in the bag of neurons between our ears that prevent us from avoiding repetition of errors.

I have seen extremely talented engineers write trivial to avoid memory corruption bugs because they were thinking about the problem they were trying to solve, and not the pitfalls they could fall into. I would argue that the vast majority of software defects in released code are written by people that know better, but the bug introduced was orthogonal to the problem they were trying to solve, or was for an edge case that was not considered in the requirements.

Unless you are writing a software component specifically to be resilient against memory corruption, preventing memory corruption issues aren't top of mind when writing code, and that is ok since humans, like the machines we build, have a limit to the amount of context/content/problem space that we can hold and evaluate at once.

Separately, you don't necessarily need to use different models to generate code vs conduct security checks, but you should be using different prompts, steering, specs, skills and agents for the two tasks because of how the model and agents interpret the instructions given.

  • > write trivial to avoid memory corruption bugs because they were thinking about [something else] [...] defects [...] written by people that know better, but the bug introduced was orthogonal to [their focus]

    For whatever reason, hadn't associated the inattentional blindness of bug writing with the invisible gorilla experiment and car crashes - selective attention fails. People looking right at the gorilla strolling into production while chest thumping, but not seeing it, for a focus on passing basketballs. That's quite an image. Tnx.

    • I've noticed even people who do offensive security for a living frequently leave gaping holes in their own code. If you're not actively primed to scan the landscape for the gorilla, you will often miss it even if you're a gorilla inquisitor.

    • Thank you in turn for making the issue much more salient to me by explicitly connecting it to the gorilla/basketball experiment. This is definitely going into my "clippings".

    • And it so happens that the flood of code coming out of tools like Claude is almost certain to bring a whole zoo invisible gorillas along for the bananas. The kinds of bugs that people already struggle to catch because of inattentional blindness flow like a torrent in code generated by these tools.

      Did you notice the missing word in the previous paragraph?

  • I think a similar thing comes into play when you ask a developer to write tests for the feature they just implemented. They’re going to have selective blindness for the edge cases (or requirements) that they failed to consider during implementation, unless they’re good at context switching into a testing mindset. And that’s something that benefits from training.

  • The problem is you as a person are not incentivized to introduce bugs in your code. If I am a company that provide provides an LLM/agent, and I know that the more bugs you have the more money I’m going to make, then I am not exactly incentivized to make my LLM/Agent better at preventing bugs. I don’t even have to explicitly make it introduce them. The incentive structure is simply out of whack.

    • Isn't it more likely the opposite - individial devs are likely to try to fudge metrics about how many vulnerabilities they find in their own code.

      Whereas with LLMs, they’re really good about providing objective metrics about the bugs they found, especially as a subsequent LLM security scan does not know whether the same LLM wrote code earlier, the opposite of human devs.

      And is the idea that organizations and/or benchmarks won't keep track of vulnerability rates for code from different LLMs?

      (And individual devs get paid more the more bugs that they introduced they “find”, and they have more job security with an “maintainable” code base than a “finished” one.)

    • Depends on how the billing works.

      For users on fixed monthly pay accounts they'll be incentivised to do the exact opposite, as their income is fixed and the cost goes up for more tokens.

      If the available evidence (third-party cloud pricing of open models) is correct and they make a profit on tokens but lose it on training, they will be incentivised for as many tokens as possible on pay-as-you-go API calls. If it isn't correct and they actually lose money even per token, they're also going to be incentivised to reduce output here.

    • That’s like saying screw manufacturers are incentivized to give you crappy screws because it means you will buy more.

      No. You will switch to a competitor that does a better job or charges less or both.

      This is why monopolies are such a big problem. Because under a monopoly you are right.

      3 replies →

Are you thinking of the cobra effect (aka https://en.wikipedia.org/wiki/Perverse_incentive) where people in India started breeding cobras to get the reward?

  • Plenty of examples abound:

    https://en.wikipedia.org/wiki/Great_Hanoi_Rat_Massacre

    > Today, the events are often used as an example of a perverse incentive, commonly referred to as the cobra effect. The modern discoverer of this event, American historian Michael G. Vann argues that the cobra example from the British Raj cannot be proven, but that the rats in the Vietnam case can be proven, so the term should be changed to the Rat Effect.

Reminds me of the contracts we sign with off-shore development companies to write the software at one rate and then fix bugs at a higher rate. Won’t be long till tokens spent on security review agents cost more than the tokens to create the bugs in the first place.

Great analogy. The problem is the incentive structure. Anthropic would nothing nothing more than for all of us to write big sprawling slop codebases so we can spend endless tokens reading, rereading, fixing, refixing forever.

You don't need different models, just different contexts (optimally with different personas).