← Back to context

Comment by andnand

5 days ago

Whats your workflow? Ive been playing with Claude Code for personal use. Usually new projects for experimentation. We have Copilot licenses through work so I've been playing around with VS Code agent mode for the last week. Usually using 3.5, 3.7 Sonnet or 04-mini. This is in a large Go project. Its been abysmal at everything other than tests. I've been trying to figure out if I'm just using the tooling wrong but I feel like I've tried all the "best practices" currently. Contexts, switching models for planning and coding, rules, better prompting. Nothings worked so far.

Switch to using Sonnet 4 (it's available in VS Code Insiders for me at least). I'm not 100% sure but a Github org admin and/or you might need to enable this model in the Github web interface.

Write good base instructions for your agent[0][1] and keep them up to date. Have your agent help you write and critique it.

Start tasks by planning with your agent (e.g. "do not write any code."), and have your agent propose 2-3 ways to implement what you want. Jumping straight into something with a big prompt is hit or miss, especially with increased task complexity. Planning also gives your agent a chance to read and understand the context/files/code involved.

Apologies if I'm giving you info you're already aware of.

[0] https://code.visualstudio.com/docs/copilot/copilot-customiza...

[1] Claude Code `/init`

  • This is exactly what I was looking for. Thanks! Im trying to give these tools a fair shot before I judge them. Ive had success with detailed prompts and letting the agent jump straight in when working on small/new projects. Ill give more planning prompts a shot.

    Do you change models between planning and implementation? I've seen that recommended but it's been hard to judge if that's made a difference.

    • Glad I could help!

      Sometimes I do planning in stronger models like Gemini 2.5 Pro (started giving o3 a shot at this the past couple days) with all the relevant files in context, but often times I default to Sonnet 4 for everything.

      A common pattern is to have the agent write down plans into markdown files (which you can also iterate on) when you get beyond a certain task size. This helps with more complex tasks. For large plans, individual implementation-phase-specific markdown files.

      Maybe these projects can provide some assistance and/or inspiration:

      - https://www.task-master.dev/

      - https://github.com/Helmi/claude-simone

I really don't get it. I've tested some agents and they can generate boilerplate. It looks quite impressive if you look at the logs, actually seems like an autonomous intelligent agent.

But I can run commands on my local linux box that generate boilerplate in seconds. Why do I need to subscribe to access gpu farms for that? Then the agent gets stuck at some simple bug and goes back and forth saying "yes, I figured out and solved it now" and it keeps changing between two broken states.

The rabid prose, the Fly.io post deriding detractors... To me it seems same hype as usual. Lots of words about it, the first few steps look super impressive, then it gets stuck banging against a wall. If almost all that is said is prognostication and preaching, and we haven't seen teams and organizations racing ahead on top of this new engine of growth... maybe it can't actually carry loads outside of the demo track?

It can be useful. Does it merit 100 billion dollar outlays and datacenter-cum-nuclear-powerplant projects? I hardly think so.

  • What commands/progs on your local Linux box? Would love to be able to quantify how inaccurate the LLMs are compared to what people already use for their boilerplate stuff.

    I've found the agents incredibly hit and miss. Mostly miss. The likes of Claude Code occasionally does something surprising and it actually works (usually there's a public example it's copied wholly when you research the code it gave you, especially for niche stuff), but then the rest of the time you spend hours wrestling it into submission over something you could do in minutes, all whilst it haemorrhages context sporadically. Even tried adding an additional vector database to the likes of Claude Code to try and get around this, but it's honestly a waste of time in my experiences.

    Is it "useless"? For me, yes, probably. I can't find any valid use for an LLM so far in terms of creating new things. What's already been done before? Sure. But why an LLM in that case?

    The strangest thing I've seen so far is Claude Code wanting a plugin to copy values from a metadata column in WordPress to then read, which is triggered by a watcher every five minutes—instead of just reading the value when relevant. It could not be wrangled into behaving over this and I gave up.

    Took me 2 minutes to do the whole thing by hand, and it worked first try (of course—it's PHP—not complicated compared to Verilog and DSP, at which it is spectacularly bad in its output).

    It does very odd things in terms of secrets and Cloudflare Workers too.

    The solutions it gives are frequently nonsensical, incomplete, mixes syntax from various languages (which sometimes it catches itself on before giving you the artifact), and almost always wholly in how inefficient the pointless steps to a simple task are.

    Giving Claude Code tutorials, docs, and repos of code is usually a shitshow too. I asked their customer support for a refund weeks ago and have heard nothing. All hype and no substance.

    I can see how someone without much dev experience might be impressed by its output, especially if they're only asking it to do incredibly simplistic stuff, for which there's plenty of examples and public discourse on troubleshooting bad code, but once you get into wanting to do new things, I just don't see how anyone could think this is ever going to be viable.

    I mucked around with autonomous infrastructure via Claude Code too, and just found that it did absolutely bizarre things that made no sense in terms of managing containers relative to logs, suggesting configurations et al. Better off with dumb scripts with your env vars, secrets et al.

make sure it writes a requirements and design doc for the change its gonna make, and review those. and, ask it to ask you questions about where there's ambiguity, and to record those responses.

when it has a work plan, track the workplan as a checklist that it fills out as it works.

you can also atart your conversations by asking it to summarize the code base

My experiments with copilot and Claude desktop via mcp on the same codebase suggest that copilot is trimming the context much more than desktop. Using the same model the outputs are just less informed.