← Back to context

Comment by rienbdj

6 days ago

The commits are revealing.

Look at this one:

> Ask Claude to remove the "backup" encryption key. Clearly it is still important to security-review Claude's code!

> prompt: I noticed you are storing a "backup" of the encryption key as `encryptionKeyJwk`. Doesn't this backup defeat the end-to-end encryption, because the key is available in the grant record without needing any token to unwrap it?

I don’t think a non-expert would even know what this means, let alone spot the issue and direct the model to fix it.

That is how LLM:s should be used today. An expert prompts it and checks the code. Still saves a lot of time vs typing everything from scratch. Just the other day I was working on a prototype and let claude write code for a auth flow. Everything was good until the last step where it was just sending the user id as a string with the valid token. So if you got a valid token you could just pass in any user id and become that user. Still saved me a lot of time vs doing it from scratch.

  • At least for me, I'm fairly sure that I'm better at not adding security flaws to my code (which I'm already not perfect at!) than I am at spotting them in code that I didn't write, unfortunately.

    • They're different mindsets. Some folks are better editors, inspectors, auditors, etc, whereas some are better builders, creators, and drafters.

      So what you're saying makes sense. And I'm definitely on the other side of that fence.

      4 replies →

  • > Still saves a lot of time vs typing everything from scratch

    No it doesn't. Typing speed is never the bottleneck for an expert.

    As an offline database of Google-tier knowledge, LLM's are useful. Though current LLM tech is half-baked, we need:

    a) Cheap commodity hardware for running your own models locally. (And by "locally" I mean separate dedicated devices, not something that fights over your desktop's or laptop's resources.)

    b) Standard bulletproof ways to fine-tune models on your own data. (Inference is already there mostly with things like llama.cpp, finetuning isn't.)

    • > No it doesn't. Typing speed is never the bottleneck for an expert

      How could that possibly be true!? Seems like it'd be the same as suggesting being constrained to analog writing utensils wouldn't bottleneck the process of publishing a book or research paper. At the very least such a statement implies that people with ADHD can't be experts.

      10 replies →

  • > Still saves a lot of time vs typing everything from scratch.

    In my experience, it takes longer to debug/instruct the LLM than to write it from scratch.

    • Depends on what you're doing. For example when you're writing something like React components and using something like Tailwind for styling, I find the speedup is close to 10X.

      13 replies →

  • For me, it’s not the typing - it’s the understanding. If I’m typing code, I have a mental model already or am building one as I type, whereas if I have an LLM generate the code then it’s “somebody else’s code” and I have to take the time to understand it anyway in order to usefully review it. Given that’s the case, I find it’s often quicker for me to just key the code myself, and come away with a better intuition for how it works at the end.

  • I tend to disagree, but I don't know what my disagreement means for the future of being able to use AI when writing software. This workers-oauth-provider project is 1200 lines of code. An expert should be able to write that on the scale of an hour.

    The main value I've gotten out of AI writing software comes from the two extremes; not from the middle-ground you present. Vibe coding can be great and seriously productive; but if I have to check it or manually maintain it in nearly any capacity more complicated than changing one string, productivity plummets. Conversely; delegating highly complex, isolated function writing to an AI can also be super productive, because it can (sometimes) showcase intelligence beyond mine and arrive at solutions which would take me 10x longer; but definitionally I am not the right person to check its code output; outside of maybe writing some unit tests for it (a third thing AI tends to be quite good at)

    • > This workers-oauth-provider project is 1200 lines of code. An expert should be able to write that on the scale of an hour.

      Are you being serious here?

      Let's do the math.

      1200 lines in a hour would be one line every three seconds, with no breaks.

      And your figure of 1200 lines is apparently omitting whitespace and comments. The actual code is 2626 lines. Let's say we ignore blank lines, then it's 2251 lines. So one line per ~1.6 seconds.

      The best typists type like 2 words per second, so unless the average line of code has 3 words on it, a human literally couldn't type that fast -- even if they knew exactly what to type.

      Of course, people writing code don't just type non-stop. We spend most of our time thinking. Also time testing and debugging. (The test is 2195 lines BTW, not included in above figures.) Literal typing of code is a tiny fraction of a developer's time.

      I'd say your estimate is wrong by at least one, but realistically more likely two orders of magnitude.

      2 replies →

    • > An expert should be able to write that on the scale of an hour.

      An expert in oauth, perhaps. Not your typical expert dev who doesn't specialize in auth but rather in whatever he's using the auth for. Navigating those sorts of standards is extremely time consuming.

      3 replies →

  • I really don't agree with the idea that expert time would just be spent typing, and I'd be really surprised if that's the common sentiment around here.

    An expert reasons, plans ahead, thinks and reasons a little bit more before even thinking about writing code.

    If you are measuring productivity by lines of code per hour then you don't understand what being a dev is.

    • > I really don't agree with the idea that expert time would just be spent typing, and I'd be really surprised if that's the common sentiment around here.

      They didn't suggest that at all, they merely suggested that the component of the expert's work that would otherwise be spent typing can be saved, while the rest of their utility comes from intense scrutiny, problem solving, decision making about what to build and why, and everything else that comes from experience and domain understanding.

      5 replies →

    • Yea, and you still do that now. Lets say you spend 30% of your time coding and the rest planning. Well, now you got even more time for planning.

  • > Still saves a lot of time vs typing everything from scratch

    Probably very language specific. I use a lot of Ruby, typing things takes no time it's so terse. Instead I get to spend 95% of my time pondering my problems (or prompting the LLM)...

    • With a proper IDE you don't type much even in Java/.Net, it's all autocomplete anyway. "Too verbose" complaints are mostly from Notepad lovers, and those who never needed to read somebody else's code.

  • > ... Still saves a lot of time vs typing everything from scratch ...

    how ? the prompts have still to be typed right ? and then the output examined in earnest.

    • Latest project I been working on. Prompts are a few sentences (and technically I dictate them instead of typing) and the LLM generates a few hundred lines of code.

    • not if you don't want to. speech to text is pretty good these days, and even eg aider has a /voice command thanks to OpenAI's whisper.

Revealing against what?

If you look at the README it is completely revealed... so i would argue there is nothing to "reveal" in the first place.

> I started this project on a lark, fully expecting the AI to produce terrible code for me to laugh at. And then, uh... the code actually looked pretty good. Not perfect, but I just told the AI to fix things, and it did. I was shocked.

> To emphasize, this is not "vibe coded". Every line was thoroughly reviewed and cross-referenced with relevant RFCs, by security experts with previous experience with those RFCs.

While I think this is a cool (public) experiment by Claude, asking an LLM to write security-sensitive code seems crazy at this point. Ad absurdum: Can you imagine asking Claude to implement new functionality in OpenSSL libs!?

Which is exactly why AI coding assistants work with your expertise rather than replace it. Most people I see fail at AI assisted development are either non-technical people expecting the AI will solve it all, or technical people playing gotcha with the machine rather than collaborating with it.

There is also one quite early in the repo where the dev has to tell Claude to store only the hashes of secrets

Yeah I was disappointed in that one.

I hate to say, though, but I have reviewed a lot of human code in my time, and I've definitely caught many humans making similar-magnitude mistakes. :/

  • I just wanted to say thanks so much publishing this, and especially your comments here - I found them really helpful and insightful. I think it's interesting (though not unexpected) that many of the other commenters' comments here show what a Rorschach test this is. I think that's kind of unfortunate, because your experience clearly showed some of the benefits and limitations/pitfalls of coding like this in an objective manner.

    I am curious, did you find the work of reviewing Claude's output more mentally tiring/draining than writing it yourself? Like some other folks mentioned, I generally find reviewing code more mentally tiring than writing it, but I get a lot of personal satisfaction by mentoring junior developers and collaborating with my (human) colleagues (most of them anyway...) Since I don't get that feeling when reviewing AI code, I find it more draining. I'm curious how you felt reviewing this code.

    • I find reviewing AI code less mentally tiring that reviewing human code.

      This was a surprise to me! Until I tried it, I dreaded the idea.

      I think it is because of the shorter feedback loop. I look at what the AI writes as it is writing it, and can ask for changes which it applies immediately. Reviewing human code typically has hours or days of round-trip time.

      Also with the AI code I can just take over if it's not doing the right thing. Humans don't like it when I start pushing commits directly to their PR.

      There's also the fact that the AI I'm prompting is, obviously, working on my priorities, whereas humans are often working on other priorities, but I can't just decline to review someone's code because it's not what I'm personally interested in at that moment.

      When things go well, reviewing the AI's work is less draining than writing it myself, because it's basically doing the busy work while I'm still in control of high-level direction and architecture. I like that. But things don't always go well. Sometimes the AI goes in totally the wrong direction, and I have to prompt it too many times to do what I want, in which case it's not saving me time. But again, I can always just cancel the session and start doing it myself... humans don't like it when I tell them to drop a PR and let me do it.

      Personally, I don't generally get excited about mentoring and collaborating. I wish I did, and I recognize it's an important part of my job which I have to do either way, but I just don't. I get excited primarily about ideas and architecture and not so much about people.

      1 reply →

  • Most interesting aspect of this is it likely learned this pattern from human-written code!

    • It's not a 100% bad idea. If you lose the encryption key, you lose the data. Data loss is bad! So better keep a backup of the key somewhere. I can see how it got there.

      Defeats the purpose in this case though.

this seems like a true but pointless observation? if you're producing security-sensitive code then experts need to be involved, whether that's me unwisely getting a junior to do something, or receiving a PR from my cat, or using an LLM.

removing expert humans from the loop is the deeply stupid thing the Tech Elite Who Want To Crush Their Own Workforces / former-NFT fanboys keep pushing, just letting an LLM generate code for a human to review then send out for more review is really pretty boring and already very effective for simple to medium-hard things.

  • > …removing expert humans from the loop is the deeply stupid thing the Tech Elite Who Want To Crush Their Own Workforce…

    this is completely expected behavior by them. departments with well paid experts will be one of the first they’ll want to cut. in every field. experts cost money.

    we’re a long, long, long way off from a bot that can go into random houses and fix under the sink plumbing, or diagnose and then fix an electrical socket. however, those who do most of their work on a computer, they’re pretty close to a point where they can cut these departments.

    in every industry in every field, those will be jobs cut first. move fast and break things.

  • I think it's a critically important observation.

    I thought this experience was so helpful as it gave an objective, evidence-based sample on both the pros and cons of AI-assisted coding, where so many of the loudest voices on this topic are so one-sided ("AI is useless" or "developers will be obsolete in a year"). You say "removing expert humans from the loop is the deeply stupid thing the Tech Elite Who Want To Crush Their Own Workforces / former-NFT fanboys keep pushing", but the fact is many people with the power to push AI onto their workers are going to be more receptive to actual data and evidence than developers just complaining that AI is stupid.

But AIbros will be running around telling everyone that Claude invented OAuth for Cloudflare all on its own and then opensourced it.

It's a Jr Developer that you have to check all their code over. To some people that is useful. But you're still going to have to train Jr Developers so they can turn into Sr Developers.

  • I don't like the jr dev analogy. It neither has the same weaknesses nor the same strenghts.

    It's more like the genious coworker that has an overassertive ego and sometimes shows up drunk, but if you know how to work with them and see past their flaws, can be a real asset.

    • I also like your analogy, but it also explains why I find working with AI-assisted coding so mentally tiresome.

      It's like with some auto-driving systems - I say it like having a slightly inebriated teenager at the wheel. I can't just relax and read a book, because then I'd die. But so I have to be more mentally alert than just driving myself because everything could be going smoothly and relaxed, but at any moment the driving system could decide to drive into a tree.

  • I don't really agree; a junior developer, if they're curious enough, wouldn't just write insecure code, they would do self-study and find out best practices etc before writing code, including not storing plaintext passwords and the like.