← Back to context

Comment by tstrimple

2 days ago

> Every good Brenda is precisely good because she checks her own work before shipping it. AI does not do this.

A confident statement that's trivial to disprove. I use claude code to build and deploy services on my NAS. I can ask it to spin up a new container on my subdomain and make it available internal only or also available externally. It knows it has access to my Cloudflare API key. It knows I am running rootless podman and my file storage convention. It will create the DNS records for a cloudflared tunnel or just setup DNS on my pihole for internal only resolution. It will check to make sure podman launched the container and it will then try to make an HTTP request to the site to verify that it is up. It will reach for network tools to test both the public and private interfaces. It will check the podman logs for any errors or warnings. If it detects errors, it will attempt to resolve them and is typically successful for the types of services I'm hosting.

Instructions like: "Setup Jellyfin in a container on the NAS and integrate it with the rest of the *arr stack. I'd like it to be available internally and externally on watch.<domain>.com" have worked extremely well for me. It delivers working and integrated services reliably and does check to see that what it deployed is working all without my explicit prompting.

You’ve switched contexts completely with your strawman. Meaning, you’ve pivot Brenda in finance to some technical/software engineer task. You’ve pointed the conversation specifically at a use case that AI is good at, writing code and solving those problems. The world at large is much more complex than helping you be a 10x engineer. To live up to the hype, it has to do this reliably for every vertical in a majority of situations. It’s not even close to being there.

Also, context equivalent counter examples abound. Just read HN or any tech forum and it’s takes no time to hear people talking about the hallucinations and garbage that AI sometimes generates. The whole vibe coding trend is built on “make this app” then followed by hundreds of “fix this” “fix that” prompts because it doesn’t get much right at first attempt.

  • You're moving goalposts. You claimed "AI" cannot verify results and that's trivially false. Claude code verifies results on a regular basis. You don't have a clue what you're talking about and are just pushing ignorant FUD.

    • It can't with reliability is what I'm saying. I'm not doubting you built one singular use case where it has. When I feed Copilot a PDF contract and ask it what is my monthly minimum I can charge this client and it tells me $1000, I ask it a dozen other questions and it changes it's response but never to the correct value, then when I ask it to cite where it finds that information and it points me to a paragraph that clearly says $1500 - spelled out clear as day, not entangled in a bunch of legalese or anything else - how is that reliable for a Brenda in finance? (this is a real case I tried out)

In the above scenario, if Claude accidentally wipes out your Jellyfin movies, will Claude deal with consequences (ie an unhappy family/friends) or will you?

That exemption from accountability is a massive factor that renders any comparison meaningless.

In a business scenario, a model provider that could assume financial and legal liability for mistakes (as humans need to do) would be massively more expensive.