Comment by fragmede

7 months ago

Interesting. I asked Claude and ChatGPT-4o similar things and got quite a bit of variance. Using Aider and giving it your prompt, "Output a single HTML page with included JavaScript and CSS that fetches the latest levels of the urine tank on the ISS and displays it appropriately - it should be mobile friendly" and adding "use the same api as the swift code" worked in one shot. However, Claude could not one-shot it If I just asked for a "web page", and it took a couple more prompts to get it working. ChatGPT-4o kinda failed at the task. It hallucinated a URL to load lightstream.js from, but didn't realize that and I had to gasp debug the problem myself. I also tried with Copilot in VSCode since that's now free and got similar results.

With such variance though, it now becomes much easier for me to see why the question of if LLMs are any good at coding is so contentious every time it comes up on HN. If, even for such a small, well defined task, there's such variance in behavior from seemingly small prompt changes, it's now easier for me to see why some people see it as the second coming and others think LLM-assisted program is all hot air.

1 comment

fragmede

sumedh 7 months ago

> With such variance though

I agree, I have noticed some prompts which work perfectly fine on Claude when used in WindSurf IDE which uses Claude the same prompt did not work.

LLM models work fine for small scripts but when it comes to large Codebase I just cannot trust them.