I stumbled into Agentic Coding in VS Code Nightlys with co-pilot using Claude Sonnet 4 and I've been silly productive. Even when half my day is meetings, you wouldn't be able to tell from my git history.
My thinking now is removed from the gory details and is a step or two up. How can I validate the changes are working? Can I understand this code? How should it be structured so I can better understand it? Is there more we can add to the AI conventions markdown in the repo to guide the Agent to make fewer mistaken assumptions?
Last night I had a file with 38 mypy errors. I turned it over to the agent and went and had a conversation with my wife for 15 minutes. I came back, it summarized the changes it made and why, I debated one of the changes with it but ultimately decided it was right.
Mypy passed. Good to go.
I'm currently trying to get my team to really understand the power here. There's a lot of skeptics and the AI still isn't perfect and people who are against the AI era will latch onto that as validation but it's exactly opposite the correct reaction. It's really validation because as a friend of mine says
"Today is the worst day you will have with this technology for the rest of your life."
> AI discourse would be more effective if we could all see the actual work one another is doing with it
Yes, this is a frequent problem both here and everywhere else. The discussions need to include things like exact model version, inference parameters, what system prompt you used, what user prompt, what code you gave it, what exactly it replied and so much more details, as currently almost every comment is "Well, I used Sonnet last week and it worked great" without any details. Not to mention discussions around local models missing basic stuff like what quantization (if any) and what hardware you're running it on. People just write out "Wow fast model" or stuff like that, and call it a day.
Although I understand why, every comment be huge if everyone always add sufficient context. I don't know the solution to this, but it does frustrate me.
I don’t know about fixing python types, but fixing typescript types can be very time consuming. A LOT of programming work is like this —- not solving anything interesting or difficult, but just time-consuming drudgery.
These tools have turned out to be great at this stuff. I don’t think I’ve turned over any interesting problems to an LLM and had it go well, but by using them to take care of drudgery, I have a lot more time to think about the interesting problems.
I would suggest that instead of asking people to post their work, try it out on whatever bullshit tasks you’ve been avoiding. And I specifically mean “tasks”. Stuff where the problem has already been solved a thousand times before.
For me comments are for discussions, not essays - from my perspective you went straight into snark about the parent's coding abilities, which kinda kills any hope of a conversation.
I trust it more with Rust than Python tbh, because with Python you need to make sure it runs every code path as the static analysis isn't as good as clippy + rust-analyzer.
I agree, had more luck with various models writing Rust than Python, but only in the case where they have tools available so one way or another it can run `cargo check` and see the nice errors, otherwise it's pretty equal between the two.
I think the excellent error messages in Rust also help as much humans as it does LLMs, but some of the weaker models get misdirected by some of the "helpful" tips, like some error message suggest "Why don't you try .clone here?" when the actual way to address the issue was something else.
That's true typed languages seem to handle the slop better. One thing I've noticed specifically with rust is that agents tend to overcomplicate things though. They tend to start digging into the gnarlier bits of the language much quicker than they probably need to.
Whats your workflow? Ive been playing with Claude Code for personal use. Usually new projects for experimentation. We have Copilot licenses through work so I've been playing around with VS Code agent mode for the last week. Usually using 3.5, 3.7 Sonnet or 04-mini. This is in a large Go project. Its been abysmal at everything other than tests. I've been trying to figure out if I'm just using the tooling wrong but I feel like I've tried all the "best practices" currently. Contexts, switching models for planning and coding, rules, better prompting. Nothings worked so far.
Switch to using Sonnet 4 (it's available in VS Code Insiders for me at least). I'm not 100% sure but a Github org admin and/or you might need to enable this model in the Github web interface.
Write good base instructions for your agent[0][1] and keep them up to date. Have your agent help you write and critique it.
Start tasks by planning with your agent (e.g. "do not write any code."), and have your agent propose 2-3 ways to implement what you want. Jumping straight into something with a big prompt is hit or miss, especially with increased task complexity. Planning also gives your agent a chance to read and understand the context/files/code involved.
Apologies if I'm giving you info you're already aware of.
I really don't get it. I've tested some agents and they can generate boilerplate. It looks quite impressive if you look at the logs, actually seems like an autonomous intelligent agent.
But I can run commands on my local linux box that generate boilerplate in seconds. Why do I need to subscribe to access gpu farms for that? Then the agent gets stuck at some simple bug and goes back and forth saying "yes, I figured out and solved it now" and it keeps changing between two broken states.
The rabid prose, the Fly.io post deriding detractors... To me it seems same hype as usual. Lots of words about it, the first few steps look super impressive, then it gets stuck banging against a wall. If almost all that is said is prognostication and preaching, and we haven't seen teams and organizations racing ahead on top of this new engine of growth... maybe it can't actually carry loads outside of the demo track?
It can be useful. Does it merit 100 billion dollar outlays and datacenter-cum-nuclear-powerplant projects? I hardly think so.
make sure it writes a requirements and design doc for the change its gonna make, and review those. and, ask it to ask you questions about where there's ambiguity, and to record those responses.
when it has a work plan, track the workplan as a checklist that it fills out as it works.
you can also atart your conversations by asking it to summarize the code base
My experiments with copilot and Claude desktop via mcp on the same codebase suggest that copilot is trimming the context much more than desktop. Using the same model the outputs are just less informed.
> Is there more we can add to the AI conventions markdown in the repo to guide the Agent to make fewer mistaken assumptions?
Forgive my ignorance, but is this just a file you're adding to the context of every agent turn or this a formal convention in the VS code copilot agent? And I'm curious if there's any resources you used to determine the structure of that document or if it was just a refinement over time based on mistakes the AI was repeating?
I just finished writing one. It is essentially the onboarding doc for your project.
It is the same stuff you'd tell a new developer on your team: here are the design docs, here are the tools, the code, and this is how you build and test, and here are the parts you might get hung up on.
In hindsight, it is the doc I should have already written.
I find it excellent news that all the techniques that make agentic coding more efficient also make human coding more efficient. There was a worry that code would become big mud balls that only AI understand, but it looks like the opposite. Clear code is important for AI productivity, so it now matters even more, because the difference of productivity is immediately and objectively measurable. Before AIs what code was well factored or not was largely a matter of opinion. Now you can say; look how better Claude works on codebase A vs codebase B, and present your case with numbers.
> There was a worry that code would become big mud balls
That's always been a worry with programming (see basically all Rich Hickey talks), and is still a problem since people prefer "moving fast today" instead of "not having 10 tons of technical debt tomorrow"
LLMs makes it even easier for people to spend the entire day producing boilerplate without stopping for a second to rethink why they are producing so much boilerplate. If the pain goes away, why fix it?
Literally less than an hour ago, I reviewed a bunch of LLM-generated boilerplate. I then told the agent to show me a plan to refactor it. I suggested some tweaks, and then it implemented the plan and then tested that it didn't break anything.
It isn't much different than dealing with an extremely precocious junior engineer.
Given how easy it is to refactor now, it certainly makes economic sense to delay it.
The type of person that would do that would have done the same thing without LLMs. LLMs don’t change anything except now they can just create their big ball of mud faster.
The pain of shitty code doesn’t go away. They can ship your crappy MVP faster, but technical debt doesn’t magically go away.
This is an awesome opportunity for those people to start learning how to do software design instead of just “programming”. People that don’t are going to be left behind.
As long as interfaces are well defined, comprehensive tests are written, memory is safely handled and time complexity is analyzable, who cares what the rest of the code looks like.
I understand programming for the sake of programming, chasing purity and really digging into the creative aspects of coding. But I get that same kick out of writing perfect interfaces, knowing that the messier the code underneath is, the more my beautiful interface gets to shine. But transformers are offering us a way to build faster, to create more, and to take on bigger complexity while learning deeply about new domains as we go. I think the more we lean into that, we might enter a software golden age where the potential for creativity and impact can enter a whole new level.
I was struck by this too. Good error messages, fast tools, stable ecosystems, simple code without magic, straight SQL… it’s what I always want. Maybe agents will be what raises the bar for dev experience, simply because they work so quickly that every slowdown matters.
So using agents forces (or at least nudges) you to use go and tailwind, because they are simple enough (and abundant in the training data) for the AI to use correctly.
Does this mean that eventually in a world where we all use this stuff, no new language/framework/library will ever be able to emerge?
Competing with the existing alternatives will be too hard. You won't even be able to ask real humans for help on platforms like StackOverflow because they will be dead soon.
> Does this mean that eventually in a world where we all use this stuff, no new language/framework/library will ever be able to emerge?
I highly doubt it. These things excel at translation.
Even without training data, if you have an idiosyncratic-but-straightforward API or framework, they pick it up no problem just looking at the codebase. I know this from experience with my own idiosyncratic C# framework that no training data has ever seen, that the LLM is excellent at writing code against.
I think something like Rust lifetimes would have a harder time getting off the ground in a world where everyone expects LLM coding to work off the bat. But something like Go would have an easy time.
Even with the Rust example though, maybe the developers of something that new would have to take LLMs into consideration, in design choices, tooling choices, or documentation choices, and it would be fine.
> Does this mean that eventually in a world where we all use this stuff, no new language/framework/library will ever be able to emerge?
That's a very good question.
Rephrased: as good training data will diminish exponentially with the Internet being inundated by LLM regurgitations, will "AI savvy" coders prefer old, boring languages and tech because there's more low-radiation training data from the pre-LLM era?
The most popular language/framework combination in early 2020s is JavaScript/React. It'll be the new COBOL, but you won't need an expensive consultant to maintain in the 2100s because LLMs can do it for you.
Corollary: to escape the AI craze, let's keep inventing new languages. Lisps with pervasive macro usage and custom DSLs will be safe until actual AGIs that can macroexpand better than you.
> Rephrased: as good training data will diminish exponentially with the Internet being inundated by LLM regurgitations
I don't think the premise is accurate in this specific case.
First, if anything, training data for newer libs can only increase. Presumably code reaches github in a "at least it compiles" state. So you have lots of people fight the AIs and push code that at least compiles. You can then filter for the newer libs and train on that.
Second, pre-training is already mostly solved. The pudding seems to be now in post-training. And for coding a lot of post-training is done with RL / other unsupervised techniques. You get enough signals from using generate -> check loops that you can do that reliably.
The idea that "we're running out of data" is way too overblown IMO, especially considering the last ~6mo-1y advances we've seen so far. Keep in mind that the better your "generation" pipeline becomes, the better will later models be. And the current "agentic" loop based systems are getting pretty darn good.
1. The previous gen has become bloated and complex because it widened it's scope to cover every possible miche scenario and got infiltrated by 'expert' language and framework specialists that went on an atrotecture binge.
2. As a result a new stack is born, much simpler, back to basics, than the poorly aged encumbant. It doesn't cover every niche, but it does a few newly popular things realy easy and well, and rises on the coattails of this new thing as the default envoronment for it.
3. Over time the new stack ages just as poorly as the old stack for all the same reasons. So the cycle repeats.
I do not see this changing with ai-assisted coding, as context enrichment is getting better allowing a full stack specification in post training.
> It doesn't cover every niche, but it does a few newly popular things realy easy and well, and rises on the coattails of this new thing as the default envoronment for it
How will it ever rise on the coattails of anything if it isn't in the AI training data so no one is ever incentivized to use it to begin with?
> So using agents forces (or at least nudges) you to use go and tailwind
Not even close, and the article betrays the author's biases more than anything else. The fact that their Claude Code (with Sonnet) setup has issues with the `cargo test` cli for instance is hardly a categorical issue with AIs or cargo, let alone rust in general. Junie can't seem to use its built-in test runner tool on PHP tests either, that doesn't mean AI has a problem with PHP. I just wrote a `bin/test-php` script for it to use instead, and it figures out it has to use that (telling it so in the guidelines helps, but it still keeps trying to use its built-in tool first)
As for SO, my AI assistant doesn't close my questions as duplicates. I appreciate what SO is trying to do in terms of curation, but the approach to it has driven people away in droves.
Just yesterday I gave Claude (via Zed) a project brief and a fresh elixir phoenix project. It had 0 problems. It did opt for tailwind for the css, but phoenix already sets it up when using `mix phx.new` so that's probably why.
I don't buy that it pushes you into using Go at all. If anything I'd say they push you towards Python a lot of the time when asking it random questions with no additional context.
The elixir community is probably only a fraction of the size of Go or Python, but I've never had any issues with getting it to use it.
> Does this mean that eventually in a world where we all use this stuff, no new language/framework/library will ever be able to emerge?
If you truly believe in the potential of agentic AI, then the logical conclusion is that programming languages will become the assembly languages of the 21st century. This may or may not become the unfortunate reality.
I'd bet money that in less than six months, there'll be some buzz around a "programming language for agents".
Whether that's going to make sense, I have some doubts, but as you say: For an LLM optimist, it's the logical conclusion. Code wouldn't need to be optimised for humans to read or modify, but for models, and natural language is a bit of an unnecessary layer in that vision.
Personally I'm not an LLM optimist, so I think the popular stack will remain focused on humans. Perhaps tilting a bit more towards readability and less towards typing efficiency, but many existing programming languages, tools and frameworks already optimise for that.
My best results have been with Ruby/Rails and either vanilla Bootstrap, or something like Tabler UI, Tailwind seems to be fine as well, but I'm still not a fan of the verbosity.
With a stable enough boilerplate you can come up with outstanding results in a few hours. Truly production ready stuff for small size apps.
How are you getting results when Ruby has no type system? That seems like where half the value of LLM coding agents are (dumping in type errors and it solving them).
With maturing synthetic data pipelines, can't they just take one base llm and fine tune it for 20 different niches, and allow user to access the niche with a string parameter in the API call? Even if a new version of a language released only yesterday, they could quickly generate enough synthetic training data to bake in the new syntax for that niche, and roll it out.
If AI really takes over coding, programming languages will be handled the same way we currently handle assembly code.
Right now languages are the interface between human and computer. When LLM's would take over, their ideal programming language is probably less verbose than what we are currently using. Maybe keywords could become 1 token long, etc. Just some quick thoughts here :D.
> no new language/framework/library will ever be able to emerge?
Here is a Youtube video that makes the same argument. React is / will be the last Javascript framework, because it is the dominant one right now. Even of people publish new frameworks, LLM coding assistants will not be able to assist coding using the new frameworks, so the new frameworks will not find users or popularity.
And even for React, it will be difficult to add any more new features, because LLMs only assist to write code that uses the features the LLMs know about, which are the old, established ways to write React.
> LLM coding assistants will not be able to assist coding using the new frameworks
Why not? When my coding agent discovers that they used the wrong API or used the right API wrong, it digs up the dependency source on disk (works at least with Rust and with JavaScript) and looks up the new details.
I also have it use my own private libraries the same way, and those are not in any training data guaranteed.
I guess if whatever platform/software you use doesn't have tool calling youre kind of right, but also missing something kind of commonplace today.
New frameworks can be created, but they will be different from before:
- AI-friendly syntax, AI-friendly error handling
- Before being released, we will have to spend hundred of millions of token of agents reading the framework and writing documentation and working example code with it, basically creating the dataset that other AI can reference when using the new framework.
- Create a way to have that documentation/example code easily available for AI agents (via MCP or new paradigm)
Agents no, LLMs yes. Not for generating code per se, but for answering questions. Common Lisp doesn't seem to have a strong influx of n00bs like me, and even though there's pretty excellent documentation, I find it sometimes hard to know what I'm looking for. LLMs definitely helped me a few times by answering my n00b questions I would have otherwise had to ask online.
I've been trying Claude Code with Sonnet 4.0 for a week or so now for Rust code but it feels really underwhelming (and expensive since it's via Bedrock right now). Everytime it's doing something it's missing half despite spending a lot of time planning at the beginning of the session. What am I missing?
Same. I have a very efficient workflow with Cursor Edit/Agent mode where it pretty much one-shots every change or feature I ask it to make. Working inside a CLI is painful, are people just letting Claude Code churn for 10-15 minutes and then reviewing the diff? Are people even reviewing the code?
This sort of asynchronous flow will become more and more mainstream. chatgpt.com/codex, Google's Jules and to a degree Claude Code (even though that's local) are all following that pattern: phrase a goal, send it off to the agent, review the diff and request changes, rinse and repeat until ready for PR review.
For me this only works for fairly tightly scoped tasks that aren't super complex, but it does work. And I think the days of staring down the IDE will be coming to a close for all but the most complex coding tasks in the future.
Exact same experience. I have no clue what other people are doing. I was hunting for use cases where it could be used and it kept not working. I don't get it.
Nice to see container use mentioned (https://github.com/dagger/container-use). I work with the team that made it (a lot of ex-Docker folks including the creator of Docker.)
Running agents in parallel will be a big deal as soon as we learn (or the agents learn) how to reliably work with just one.
Even before then, if you're trying to get work done while the agent is doing its own thing or you find yourself watching over the agent's "shoulder" out of fear it'll change something you didn't ask it to change, then it's useful to run it in a containerized dev environment.
Container use is definitely early but moving quickly, and probably improved even since this post was published. We're currently focused on stability, reducing git confusion, better human<>agent interaction, and environment control.
> Java has the largest, oldest and most explicit data set for the LLM to reference
That seems to be a recommendation for coding with LLMs that don't have access to tools to look up APIs, docs and 3rd party source-code, rather than something you'd chose for "Agentic Coding".
Once the tooling can automatically figure out what is right, what language you use matters less, as long as source code ends available somewhere the agent can read it when needed.
Agree much with your 2nd point though, all outputs still require careful review and what better language to use than one you know inside-out?
Why is this? is there just a insanely large codebase of open source projects in Java (the only thing i can think of is the entire Apache suite)? Or is it because the docs are that expressive and detailed for a given OSS library?
Certain points about the language, as well as certain long-existing open source projects have been discussed ad-nauseum online. This all adds to the body of knowledge.
> Context system: Go provides a capable copy-on-write data bag that explicitly flows through the code execution path, similar to contextvars in Python or .NET's execution context. Its explicit nature greatly simplifies things for AI agents. If the agent needs to pass stuff to any call site, it knows how to do it.
I believe this is considered a bad practice: the general attitude is that the only sane use case for values in context.Context is tracing data, and all other data should he explicitly passed via arguments.
The only place I’ve encountered this pattern is in chromedp, the go wrapper for the chrome headless browser driver. Its API… isn’t good.
Most methods you use are package globals that take a context.Context as a first parameter. But you have to understand that this context is a _special_ one: you can’t pass any old context like context.Background(), you must pass a context you got from one of the factory methods.
If you want to specify a timeout, you use context.WithTimeout. Clever I guess, but that’s the only setting that works like that.
I'm really not an expect in Go, but the data that I'm passing at the moment via context is the type of data which is commonly placed there by libraries I use: database connections, config, rate limiters, cache backends etc. Does not seem particularly bad to me at least.
If you use context.Context for this you give up a lot of type safety and generally make your data passing opaque.
It's totally fine to put multiple values into a different data bag type that has explicit, typed fields. For example, the Echo framework has its own strongly typed and extensible Context interface for request scoped data: https://pkg.go.dev/github.com/labstack/echo#Context
Honestly, I find this approach to be useful pretty much anytime you're working with other people as well.
There are absolutely times to be extremely focused and clever with your code, but they should be rare and tightly tied to your business value.
Most code should be "blindingly obvious" whenever possible.
The limit on developers isn't "characters I can type per minute" it's "concepts I can hold in my head."
The more of those there are... The slower you will move.
Don't create more interfaces over the existing ones, don't abstract early, feel free to duplicate and copy liberally, glue stuff together obviously (even if it's more code, or feels ugly), declare the relevant stuff locally, stick with simple patterns in the docs, don't be clever.
You will write better code. Code shouldn't be pretty, it should be obvious. It should feel boring, because the hard part should be making the product not the codebase.
> This is not an advertisment for Claude Code. It's just the agent I use at the moment. What else is there? Alternatives that are similar in their user experiences are OpenCode, goose, Codex and many others. There is also Devin and Cursor's background agents but they work a bit different in that they run in the cloud.
What do you recommand to get a Claude-code-like experience in the open-source + local llm ecosystem?
> What do you recommand to get a Claude-code-like experience in the open-source + local llm ecosystem?
There is nothing at the moment that I would recommend. However I'm quite convinced that we will see this soon. First of all I quite like where SST's OpenCode is going. The upcoming UX looks really good. Secondly because having that in place, will make it quite easy to put local models in when they get better. The issue really is that there are just not enough good models for tool usage yet. Sonnet is so shockingly good because it was trained for excellent tool usage. Even Gemini does not come close yet.
Aider is almost there, in fact it's intentionally "not" there. You can set it up to do things like run test/static analysis automatically and fix errors, and work with it to get a to-do list set up so the entire project is spec'd out, then just keep prompting it with "continue...". It has a hard coded reflection limit of 3 iterations right now, but that can also be hacked to whatever you want. The only thing missing for full agentic behavior is built in self prompting behavior.
> The only thing missing for full agentic behavior is built in self prompting behavior.
Correct me if I'm wrong, but Aider still doesn't do proper tool calling? Last time I tried it, they did it the "old school" way of parsing out unix shell commands from the output text and ran it once the response finished streaming, instead of the sort of tool call/response stuff we have today.
Single-file download, fuss-free and install-less that runs on mac, windows and linux (+ docker of course.) It can run any model that talks to openai (which is nearly all of them), so it'll work with the big guys' models and of course other ones like ones you run privately or on localhost.
Unlike Claude Code, which is very good, this one runs in your browser with a local app server to do the heavy lifting. A console app could be written to use this self-same server, too, of course (but that's not priority #1) but you do get a lot of nice benefits that you get for free from a browser.
One other advantage, vis-a-vis Armin's blog post, is that this one can "peek" into terminals that you _explicitly_ start through the service.
It's presently in closed alpha, but I want to open it up to more people to use. If you're interested, you and anyone else who is interested can ping me by email -- see my profile.
>run any model that talks to openai (which is nearly all of them)
What does that mean? I've never seen any locally run model talk to OpenAI, how and why would they? Do you mean running an inference server that provides an OpenAI-compatible API?
The Neovim plugin CodeCompanion is currently moving into a more agentic direction, it already supports an auto-submit loop with builtin tools and MCP integration.
Yes it's not a standalone CLI tool, but IMHO I'd rather have a full editor available at all times, especially one that's so hackable and lightweight.
Gotta say, 100/200 bucks monthly feels prohibitively expensive for even trying out something, particularly something as unproven as code-writing AI, even more particularly when other personal experiences with AI have been at the very least underwhelming, and extra particularly when the whole endeavor is so wrapped up in ethical concerns.
One month at 20 USD seems like it should be plenty to try it out on a small project or two to decide wether it is worth trying 100 bucks/month?
Or one can just wait a couple of months as people report their learnings.
Try Aider with API usage. Learn how to control context size (/clear, /add, /drop). Limit context to 25K. Use whatever model you want (Sonnet 4 or Gemini 2.5 Pro).
For simple scripts, it often costs me under $1 to build. I'm working on a bigger tool these days, and I've done lots of prompts, a decent amount of code, over 100 tests, and my running total is right now under $6.
I'd suggest learn the basics of using AI to code using Aider, and then consider whether you want to try Claude Code (which is likely more powerful, but also more expensive unless you use it all the time).
Yeah I've been using Aider mostly and just started using Codex, very similar to Claude Code, yesterday. Aider is more manual and requires more guiding but it's also an order of magnitude cheaper.
The monkey brain part of me that really doesn't trust an LLM and trusts my decades of hard-won programming experience also prefers using Aider because the usage flow generally goes:
1. Iterate with Aider on a plan
2. Tell Aider to write code
3. Review the code
4. Continue hacking myself until I want to delegate something to an LLM again.
5. Head back to Step 1.
Codex automates this flow significantly but it's also a lot more expensive. Just the little bits of guiding I offer an LLM through Aider can make the whole process a lot cheaper.
It's unclear to me whether the full agentic Claude Code/Codex style approach will win or whether Aider's more carefully guided approach will win in the marketplace of ideas, but as a pretty experienced engineer Aider seems to be the sweet spot between cost, impact, and authorial input.
https://github.com/dagger/container-use (cu) is improving daily. Happy to help get it working if you're hitting anything (we're all in dagger.io discord). Last night I tried it with Amazon Q Developer CLI chat (with claude-3.7-sonnet), which I hadn't touched before (will PR how-to to the README today). MCP integration just worked for me. Figured out where to put the agent rules for Q and how to restrict to just the tools from cu. I kicked off three instances of Q to modify my flask app project with the same prompt in parallel (don't step on the local source) and got three variants to review in short order. I merged the one I liked into the repo and tossed the rest.
I have seen multiple articles pushing for GO as agentic language of choice; does anyone else feel like this is quite forced? I have tried agentic coding in several languages and I didn't have a particularly good or productive experience with GO.
I don't agree with the author on the GO thing. I've created agents that work 24/7 on GH issues for me, in Rust, Python and PHP. I use Claude (api). The result overall is very good. When I wake up there is always a fresh PR waiting for me.
I don't like the word "agent" because it is not a blind LLM, small or fast script. It is a complex workflow with many checks and prompting before writing a single line of code. That's also the key to AI powered development; context.
> I've created agents that work 24/7 on GH issues for me, in Rust, Python and PHP. I use Claude (api). The result overall is very good.
It's quite possible it's a case of holding things wrong but I think at least the basic evaluation I did that made me come to the conclusion that Go works particularly well isn't too bad. I just get results that I feel good with quicker than with Rust and Python. FWIW I also had really good results with PHP on the level of Go too, it's just overall a stack that does not cater too well to my problem.
All of this is forced, yes. There's probably a trillion dollars riding on this all not imploding, so we're getting it shoved everywhere, all the time, incessantly.
My own experience with "Agents" (and no, I am not a luddite) has been nothing short of comical in how terrible it's been. We try it every single day at our company. We've tried all the advice. All the common wisdom. They have never, not once, produced anything of any value. All the public showcases of these brilliant "agents" have also been nothing short of spectacular failures [1]. Yet despite all this, I keep seeing these type of posts, and pretty much always it's from someone with a vested interest of some kind when you dig deep down enough. All the managerial types pushing it, you look deep enough it's always because the board or investors or whatever other parasite has a vested interest.
I know one thing is for certain, what AI will give us is more and more fucking advertisements shoved into every facet of our lives, except now it sorta talks like a human!
I had a recent discussion with another member of the Python community (OA is written by a big name in Python).
He started off saying "learning to code with AI is like learning to cook by ordering off the menu". I know he meant "an AI being the way you learn how to code", but there's another meaning that I've been thinking a lot about because my 16yo son is really into coding and I'm trying to come up with how I can help him be successful in the world at the horizon where he starts doing it professionally.
In that way, "learning how to work together with an AI to code" is a really, really interesting question. Because the world is going to look VERY different in 2-6 years.
I get your point, but I'm envisioning a different endpoint.
Let's take your factory example: Factories are just a fact of life right now, almost nobody is producing bespoke cars or phones or clothing. So given that my son is basically 100% likely to be working with an automation line, how do I get him on the track to being a machine operator or a millwright rather than doing conveyor belt work?
Elixir looks like agood choice as well, folks have recorded a session building a Phoenix web app with Claude Code and it went quite well for them: https://youtu.be/V2b6QCPgFTk
This text is not about go, it's about agentic coding. I have and am using this across different languages. On this project (which is a go backend) I still have TypeScript in the frontend and I have some Python based tasks too. The rules apply universally.
Meta: this hits differently because the author of this post created an awesome, popular Python web framework some 15 years ago. I miss those times dearly (using Python for web stuff).
'Many hallucinations' may become the new 'poorly documented' when it comes to tech stack decisions. I'm asking myself if it could slow down adoption of new tech in future, since it's harder to provide the equivalent learning material of 10 years of Stack Overflow than writing equally good documentation.
Pretty much my experience as well, although I would highly recommend Roo Code + Claude (via the API) to build entire projects, and Claude for "batch" tasks or finalization.
AI models are trained on data that can be 1 or 2 years old. And they're trained of what the saw the most. So, language changes, breaking API changes, dependencies that don't work any more, name changes, etc. are going to get them super confused.
Go indeed works well because of its standard library that avoids the need for many dependencies, and its stability.
I found PHP to actually be the best target language for coding agents. For the same reasons, and also for the ton of documentation and example code available. That doesn't prevent agents from automatically using some modern PHP features, applying static analysis tools, etc.
For frontend stuff, agents will almost always pick React + Tailwind because this is what they saw the most. But Tailwind 4 is very different from Tailwind 3, and that got them super confused.
> Likewise with AI I strongly prefer more code generation over using more dependencies. I wrote about why you should write your own code before, but the more I work with agentic coding, the more I am convinced of this.
Something I like to do is get Gemini Deepresearch to write a single file manual for any uncommon dependency and include that in my docs/ directory. Helps a bunch.
I also ask it to write specialized guides on narrow topics (e.g., testing async SQLAlchemy using pytest-async and pytest-postgresql).
A suggestion that maybe the dark mode shouldn't be on the end of the page and maybe on the top of the page, I personally would've loved it and do it with some of my html blogs, maybe personal preference but yeah I think I agree golang is pretty cool but the knowledge base of python feels more and I sometimes just use uv with python and ai gemini pro right within the browser to create one off cool scripts. Pretty cool!
No, I didn't try to claim that. I seem to see the influence in many people's writing and verbosity though. It could be as simple as a counter reaction: If an LLM is allowed to be verbose, so are humans. It could also be that people who use LLMs a lot subconsciously adopt the style.
I infer that you are the author of the post. Take it as a compliment, I think you have written many good pre-LLM articles.
Three or four weeks ago I was posting how LLMs were useful for one-off questions but I wouldn't trust them on my codebase. Then I spent my week's holiday messing around on them for some personal projects. I am now a fairly committed Roo user. There are lots of problems, but there is incredible value here.
I spent a good part of yesterday attempting to use ChatGPT to help me choose an appropriate API gateway. Over and over it suggested things that literally do not exist, and the only reason I could tell was that I spent a good amount of time in the actual documentation. This has been my experience roughly 80% of the time when trying to use an LLM. I would like to know what is the magical prompt engineering technique that makes it stop confidently hallucinating about literally everything.
I mirror the GP's sentiment. My initial attempts using a chat like interface were poor. Then some months ago, due to many HN comments, I decided to give Aider a try. I had put my kid to bed and it was 10:45pm. My goal was "Let me just figure out how to install Aider and play with it for a few minutes - I'll do the real coding tomorrow." 15 minutes later, not only had I installed it, my script was done. There was one bug I had to fix myself. It was production quality code, too.
I was hooked. Even though I was done, I decided to add logging, command line arguments, etc. An hour later, it was a production grade script, with a very nice interface and excellent logging.
Oh, and this was a one-off script. I'll run it once and never again. Now all my one-off scripts have excellent logging, because it's almost free.
There was no going back. For small scripts that I've always wanted to write, AI is the way to go. That script had literally been in my head for years. It was not a challenging task - but it had always been low in my priority list. How many ideas do you have in your head that you'll never get around to because of lack of time. Well, now you can do 5x more of those than you would have without AI.
I'm having a very good experience with ChatGPT at the moment. I'm mostly using it for little tasks where I don't remember the exact library functions. Examples:
"C++ question: how do I get the unqualified local system time and turn into an ISO time string?"
"Python question: how do I serialize a C struct over a TCP socket with asyncio?"
"JS question: how do I dynamically show/hide an HTML element?" (I obviously don't write a lot of JS :-D)
ChatGPT gave me the correct answers on the first try. I have been a sceptic, but I'm now totally sold on AI assisted coding, at least as a replacement for Google and StackOverflow. For me there is no point anymore in wading through all the blog spam and SEO crap just to find a piece of information. Stack Overflow is still occasionally useful, but the writing is on the wall...
EDIT: Important caveat: stay critical! I have been playing around asking ChatGPT more complex questions where I actually know the correct answer resp. where I can immediately spot mistakes. It sometimes gives me answers that would look correct to a non-expert, but are hilariously wrong.
Sure, this was exactly how I felt three weeks ago, and I could have written that comment myself. The agentic approach where it works out it made something up by looking at the errors the type-check generates is what makes the difference.
Did you use search grounding? O3 or o4-mini-high with search grounding (which will usually come on by default with questions like this) are usually the best option.
this is kind of a weird position to take. you're the captain, you're the person reviewing the code the LLM (agent or not) generates, you're the one asking for the code you want, you're in charge of deciding how much effort to put in to things, and especially which things are most worth your effort.
all this agent stuff sounded stupid to me until I tried it out in the last few weeks, and personally, it's been great - I give a not-that-detailed explanation for what I want, point it at the existing code and get back a patch to review once I'm done making my coffee. sometimes it's fine to just apply, sometimes I don't like a variable name or whatever, sometimes it doesn't fit in with the other stuff so I get it to try again, sometimes (<< 10% of the time) it's crap. the experience is pretty much like being a senior dev with a bunch of very eager juniors who read very fast.
anyway, obviously do whatever you want, but deriding something you've not looked in to isn't a hugely thoughtful process for adapting to a changing world.
Can someone recommended some source for vibe coding eg. how to prompt it properly, what tools to use? Does someone have any experience on anything other than small projects from scratch?
I had the same question because all my experience with this contradicts the hype.
I watched Ronacher's demo from yesterday, https://www.youtube.com/watch?v=sQYXZCUvpIc, and this is it, a well-regarded engineer working on a serious open source project. There's no wizard behind the curtain, it's the thing I've been asking the promoters for.
And you should make your own judgment, but I'm just not impressed.
It seems to me the machine takes longer, creates a plan that "is shit," and then has to be fixed by a person who has a perfect understanding of the problem.
I'm loving LLMs as research tools, pulling details out of bad documentation, fixing my types and dumb SQL syntax errors, and searching my own codebase in natural language.
But if I have to do all the reasoning myself no matter what, setting a robot free to make linguistically probable changes really feels like a net negative.
Given the hype and repercussions of success or failure of what LLMs can hypothetically do, I feel like the only way forward for reasonable understanding of the situation is for people to post live streams of what they're raving about.
Or at the very least source links with version control history.
My reading of the status quo is that people who use it for toy or greenfield projects written from scratch are having a blast. Until the project reaches a certain complexity in size and function when it starts to break down.
People working on existing projects in turn are scratching their heads because it's just not quite working or providing much of a productivity boost. I belong to this camp.
On the note of language choice, i've been experimenting with Claude Code recently and thought the other day how happy i am to be using Rust with it and how afraid i'd be in Python, JS, etc.
I've noticed Claude Code introduces quite a few errors and then walks through the compile errors to fix things up. Refactors/etc also become quite easy with this workflow from CC.
I'm sure it does well in dynamic languages, but given how much the LLM leans into these compile errors i get the feeling it would simply miss more things if there was none/less.
So far though my #1 concern is finding ways to constraining the LLM. It produces slop really, really quick and when it works more slowly i can avoid some of the review process. Eg i find stubbing out methods and defining the code path i want, in code, rather than trying to explain it to the LLM to be productive.
Still in my infancy of learning this tool though. It feels powerful, but also terrifying in hands of lazy folks just pushing through slop.
I stumbled into Agentic Coding in VS Code Nightlys with co-pilot using Claude Sonnet 4 and I've been silly productive. Even when half my day is meetings, you wouldn't be able to tell from my git history.
My thinking now is removed from the gory details and is a step or two up. How can I validate the changes are working? Can I understand this code? How should it be structured so I can better understand it? Is there more we can add to the AI conventions markdown in the repo to guide the Agent to make fewer mistaken assumptions?
Last night I had a file with 38 mypy errors. I turned it over to the agent and went and had a conversation with my wife for 15 minutes. I came back, it summarized the changes it made and why, I debated one of the changes with it but ultimately decided it was right.
Mypy passed. Good to go.
I'm currently trying to get my team to really understand the power here. There's a lot of skeptics and the AI still isn't perfect and people who are against the AI era will latch onto that as validation but it's exactly opposite the correct reaction. It's really validation because as a friend of mine says
"Today is the worst day you will have with this technology for the rest of your life."
> Last night I had a file with 38 mypy errors
Fixing type checker errors should be one the least time consuming things you do. This was previously consuming a lot of your time?
A lot of the AI discourse would be more effective if we could all see the actual work one another is doing with it (similar to the cloudflare post).
> AI discourse would be more effective if we could all see the actual work one another is doing with it
Yes, this is a frequent problem both here and everywhere else. The discussions need to include things like exact model version, inference parameters, what system prompt you used, what user prompt, what code you gave it, what exactly it replied and so much more details, as currently almost every comment is "Well, I used Sonnet last week and it worked great" without any details. Not to mention discussions around local models missing basic stuff like what quantization (if any) and what hardware you're running it on. People just write out "Wow fast model" or stuff like that, and call it a day.
Although I understand why, every comment be huge if everyone always add sufficient context. I don't know the solution to this, but it does frustrate me.
5 replies →
I don’t know about fixing python types, but fixing typescript types can be very time consuming. A LOT of programming work is like this —- not solving anything interesting or difficult, but just time-consuming drudgery.
These tools have turned out to be great at this stuff. I don’t think I’ve turned over any interesting problems to an LLM and had it go well, but by using them to take care of drudgery, I have a lot more time to think about the interesting problems.
I would suggest that instead of asking people to post their work, try it out on whatever bullshit tasks you’ve been avoiding. And I specifically mean “tasks”. Stuff where the problem has already been solved a thousand times before.
For me comments are for discussions, not essays - from my perspective you went straight into snark about the parent's coding abilities, which kinda kills any hope of a conversation.
I trust it more with Rust than Python tbh, because with Python you need to make sure it runs every code path as the static analysis isn't as good as clippy + rust-analyzer.
I agree, had more luck with various models writing Rust than Python, but only in the case where they have tools available so one way or another it can run `cargo check` and see the nice errors, otherwise it's pretty equal between the two.
I think the excellent error messages in Rust also help as much humans as it does LLMs, but some of the weaker models get misdirected by some of the "helpful" tips, like some error message suggest "Why don't you try .clone here?" when the actual way to address the issue was something else.
That's true typed languages seem to handle the slop better. One thing I've noticed specifically with rust is that agents tend to overcomplicate things though. They tend to start digging into the gnarlier bits of the language much quicker than they probably need to.
Whats your workflow? Ive been playing with Claude Code for personal use. Usually new projects for experimentation. We have Copilot licenses through work so I've been playing around with VS Code agent mode for the last week. Usually using 3.5, 3.7 Sonnet or 04-mini. This is in a large Go project. Its been abysmal at everything other than tests. I've been trying to figure out if I'm just using the tooling wrong but I feel like I've tried all the "best practices" currently. Contexts, switching models for planning and coding, rules, better prompting. Nothings worked so far.
Switch to using Sonnet 4 (it's available in VS Code Insiders for me at least). I'm not 100% sure but a Github org admin and/or you might need to enable this model in the Github web interface.
Write good base instructions for your agent[0][1] and keep them up to date. Have your agent help you write and critique it.
Start tasks by planning with your agent (e.g. "do not write any code."), and have your agent propose 2-3 ways to implement what you want. Jumping straight into something with a big prompt is hit or miss, especially with increased task complexity. Planning also gives your agent a chance to read and understand the context/files/code involved.
Apologies if I'm giving you info you're already aware of.
[0] https://code.visualstudio.com/docs/copilot/copilot-customiza...
[1] Claude Code `/init`
2 replies →
I really don't get it. I've tested some agents and they can generate boilerplate. It looks quite impressive if you look at the logs, actually seems like an autonomous intelligent agent.
But I can run commands on my local linux box that generate boilerplate in seconds. Why do I need to subscribe to access gpu farms for that? Then the agent gets stuck at some simple bug and goes back and forth saying "yes, I figured out and solved it now" and it keeps changing between two broken states.
The rabid prose, the Fly.io post deriding detractors... To me it seems same hype as usual. Lots of words about it, the first few steps look super impressive, then it gets stuck banging against a wall. If almost all that is said is prognostication and preaching, and we haven't seen teams and organizations racing ahead on top of this new engine of growth... maybe it can't actually carry loads outside of the demo track?
It can be useful. Does it merit 100 billion dollar outlays and datacenter-cum-nuclear-powerplant projects? I hardly think so.
1 reply →
make sure it writes a requirements and design doc for the change its gonna make, and review those. and, ask it to ask you questions about where there's ambiguity, and to record those responses.
when it has a work plan, track the workplan as a checklist that it fills out as it works.
you can also atart your conversations by asking it to summarize the code base
My experiments with copilot and Claude desktop via mcp on the same codebase suggest that copilot is trimming the context much more than desktop. Using the same model the outputs are just less informed.
> you wouldn't be able to tell from my git history.
I can easily tell from git history which commits were heavily AI generated
> Even when half my day is meetings, you wouldn't be able to tell from my git history.
Your employer, if it is not you, will now expect this level of output.
> Is there more we can add to the AI conventions markdown in the repo to guide the Agent to make fewer mistaken assumptions?
Forgive my ignorance, but is this just a file you're adding to the context of every agent turn or this a formal convention in the VS code copilot agent? And I'm curious if there's any resources you used to determine the structure of that document or if it was just a refinement over time based on mistakes the AI was repeating?
I just finished writing one. It is essentially the onboarding doc for your project.
It is the same stuff you'd tell a new developer on your team: here are the design docs, here are the tools, the code, and this is how you build and test, and here are the parts you might get hung up on.
In hindsight, it is the doc I should have already written.
> "Today is the worst day you will have with this technology for the rest of your life."
Why do we trust corporations to keep making things better all of a sudden?
The most jarring effect of this hype cycle is that all appear to refers to some imaginary set of corporate entities.
I find it excellent news that all the techniques that make agentic coding more efficient also make human coding more efficient. There was a worry that code would become big mud balls that only AI understand, but it looks like the opposite. Clear code is important for AI productivity, so it now matters even more, because the difference of productivity is immediately and objectively measurable. Before AIs what code was well factored or not was largely a matter of opinion. Now you can say; look how better Claude works on codebase A vs codebase B, and present your case with numbers.
> There was a worry that code would become big mud balls
That's always been a worry with programming (see basically all Rich Hickey talks), and is still a problem since people prefer "moving fast today" instead of "not having 10 tons of technical debt tomorrow"
LLMs makes it even easier for people to spend the entire day producing boilerplate without stopping for a second to rethink why they are producing so much boilerplate. If the pain goes away, why fix it?
Literally less than an hour ago, I reviewed a bunch of LLM-generated boilerplate. I then told the agent to show me a plan to refactor it. I suggested some tweaks, and then it implemented the plan and then tested that it didn't break anything.
It isn't much different than dealing with an extremely precocious junior engineer.
Given how easy it is to refactor now, it certainly makes economic sense to delay it.
3 replies →
The type of person that would do that would have done the same thing without LLMs. LLMs don’t change anything except now they can just create their big ball of mud faster.
The pain of shitty code doesn’t go away. They can ship your crappy MVP faster, but technical debt doesn’t magically go away.
This is an awesome opportunity for those people to start learning how to do software design instead of just “programming”. People that don’t are going to be left behind.
"There was a worry that code would become big mud balls that only AI understand, but it looks like the opposite."
For now...
As long as interfaces are well defined, comprehensive tests are written, memory is safely handled and time complexity is analyzable, who cares what the rest of the code looks like.
I understand programming for the sake of programming, chasing purity and really digging into the creative aspects of coding. But I get that same kick out of writing perfect interfaces, knowing that the messier the code underneath is, the more my beautiful interface gets to shine. But transformers are offering us a way to build faster, to create more, and to take on bigger complexity while learning deeply about new domains as we go. I think the more we lean into that, we might enter a software golden age where the potential for creativity and impact can enter a whole new level.
11 replies →
I was struck by this too. Good error messages, fast tools, stable ecosystems, simple code without magic, straight SQL… it’s what I always want. Maybe agents will be what raises the bar for dev experience, simply because they work so quickly that every slowdown matters.
So using agents forces (or at least nudges) you to use go and tailwind, because they are simple enough (and abundant in the training data) for the AI to use correctly.
Does this mean that eventually in a world where we all use this stuff, no new language/framework/library will ever be able to emerge?
Competing with the existing alternatives will be too hard. You won't even be able to ask real humans for help on platforms like StackOverflow because they will be dead soon.
> Does this mean that eventually in a world where we all use this stuff, no new language/framework/library will ever be able to emerge?
I highly doubt it. These things excel at translation.
Even without training data, if you have an idiosyncratic-but-straightforward API or framework, they pick it up no problem just looking at the codebase. I know this from experience with my own idiosyncratic C# framework that no training data has ever seen, that the LLM is excellent at writing code against.
I think something like Rust lifetimes would have a harder time getting off the ground in a world where everyone expects LLM coding to work off the bat. But something like Go would have an easy time.
Even with the Rust example though, maybe the developers of something that new would have to take LLMs into consideration, in design choices, tooling choices, or documentation choices, and it would be fine.
> Does this mean that eventually in a world where we all use this stuff, no new language/framework/library will ever be able to emerge?
That's a very good question.
Rephrased: as good training data will diminish exponentially with the Internet being inundated by LLM regurgitations, will "AI savvy" coders prefer old, boring languages and tech because there's more low-radiation training data from the pre-LLM era?
The most popular language/framework combination in early 2020s is JavaScript/React. It'll be the new COBOL, but you won't need an expensive consultant to maintain in the 2100s because LLMs can do it for you.
Corollary: to escape the AI craze, let's keep inventing new languages. Lisps with pervasive macro usage and custom DSLs will be safe until actual AGIs that can macroexpand better than you.
> Rephrased: as good training data will diminish exponentially with the Internet being inundated by LLM regurgitations
I don't think the premise is accurate in this specific case.
First, if anything, training data for newer libs can only increase. Presumably code reaches github in a "at least it compiles" state. So you have lots of people fight the AIs and push code that at least compiles. You can then filter for the newer libs and train on that.
Second, pre-training is already mostly solved. The pudding seems to be now in post-training. And for coding a lot of post-training is done with RL / other unsupervised techniques. You get enough signals from using generate -> check loops that you can do that reliably.
The idea that "we're running out of data" is way too overblown IMO, especially considering the last ~6mo-1y advances we've seen so far. Keep in mind that the better your "generation" pipeline becomes, the better will later models be. And the current "agentic" loop based systems are getting pretty darn good.
4 replies →
A traditional digital stack's lifecycle is:
1. The previous gen has become bloated and complex because it widened it's scope to cover every possible miche scenario and got infiltrated by 'expert' language and framework specialists that went on an atrotecture binge.
2. As a result a new stack is born, much simpler, back to basics, than the poorly aged encumbant. It doesn't cover every niche, but it does a few newly popular things realy easy and well, and rises on the coattails of this new thing as the default envoronment for it.
3. Over time the new stack ages just as poorly as the old stack for all the same reasons. So the cycle repeats.
I do not see this changing with ai-assisted coding, as context enrichment is getting better allowing a full stack specification in post training.
> It doesn't cover every niche, but it does a few newly popular things realy easy and well, and rises on the coattails of this new thing as the default envoronment for it
How will it ever rise on the coattails of anything if it isn't in the AI training data so no one is ever incentivized to use it to begin with?
1 reply →
> So using agents forces (or at least nudges) you to use go and tailwind
Not even close, and the article betrays the author's biases more than anything else. The fact that their Claude Code (with Sonnet) setup has issues with the `cargo test` cli for instance is hardly a categorical issue with AIs or cargo, let alone rust in general. Junie can't seem to use its built-in test runner tool on PHP tests either, that doesn't mean AI has a problem with PHP. I just wrote a `bin/test-php` script for it to use instead, and it figures out it has to use that (telling it so in the guidelines helps, but it still keeps trying to use its built-in tool first)
As for SO, my AI assistant doesn't close my questions as duplicates. I appreciate what SO is trying to do in terms of curation, but the approach to it has driven people away in droves.
I tried Junie in PyCharm and it had big problems with running tests or even using the virtual environment set up in PyCharm for that project.
You'd expect more from the company that is developing both the IDE and the AI agent...
1 reply →
Just yesterday I gave Claude (via Zed) a project brief and a fresh elixir phoenix project. It had 0 problems. It did opt for tailwind for the css, but phoenix already sets it up when using `mix phx.new` so that's probably why.
I don't buy that it pushes you into using Go at all. If anything I'd say they push you towards Python a lot of the time when asking it random questions with no additional context.
The elixir community is probably only a fraction of the size of Go or Python, but I've never had any issues with getting it to use it.
I’m wondering whether we may see programming languages that are either unreadable to humans or at least designed towards use by LLMs.
Yes, and an efficient tokenizer designed only for that language. As the ratio of synthetic data to human data grows this will become more plausible.
LLM as a frontend to LLVM IR maybe.
> Does this mean that eventually in a world where we all use this stuff, no new language/framework/library will ever be able to emerge?
If you truly believe in the potential of agentic AI, then the logical conclusion is that programming languages will become the assembly languages of the 21st century. This may or may not become the unfortunate reality.
I'd bet money that in less than six months, there'll be some buzz around a "programming language for agents".
Whether that's going to make sense, I have some doubts, but as you say: For an LLM optimist, it's the logical conclusion. Code wouldn't need to be optimised for humans to read or modify, but for models, and natural language is a bit of an unnecessary layer in that vision.
Personally I'm not an LLM optimist, so I think the popular stack will remain focused on humans. Perhaps tilting a bit more towards readability and less towards typing efficiency, but many existing programming languages, tools and frameworks already optimise for that.
My best results have been with Ruby/Rails and either vanilla Bootstrap, or something like Tabler UI, Tailwind seems to be fine as well, but I'm still not a fan of the verbosity.
With a stable enough boilerplate you can come up with outstanding results in a few hours. Truly production ready stuff for small size apps.
How are you getting results when Ruby has no type system? That seems like where half the value of LLM coding agents are (dumping in type errors and it solving them).
4 replies →
As an example, XML is suddenly cool again, because LLMs love it.
With maturing synthetic data pipelines, can't they just take one base llm and fine tune it for 20 different niches, and allow user to access the niche with a string parameter in the API call? Even if a new version of a language released only yesterday, they could quickly generate enough synthetic training data to bake in the new syntax for that niche, and roll it out.
If AI really takes over coding, programming languages will be handled the same way we currently handle assembly code.
Right now languages are the interface between human and computer. When LLM's would take over, their ideal programming language is probably less verbose than what we are currently using. Maybe keywords could become 1 token long, etc. Just some quick thoughts here :D.
> no new language/framework/library will ever be able to emerge?
Here is a Youtube video that makes the same argument. React is / will be the last Javascript framework, because it is the dominant one right now. Even of people publish new frameworks, LLM coding assistants will not be able to assist coding using the new frameworks, so the new frameworks will not find users or popularity.
And even for React, it will be difficult to add any more new features, because LLMs only assist to write code that uses the features the LLMs know about, which are the old, established ways to write React.
https://www.youtube.com/watch?v=P1FLEnKZTAE
> LLM coding assistants will not be able to assist coding using the new frameworks
Why not? When my coding agent discovers that they used the wrong API or used the right API wrong, it digs up the dependency source on disk (works at least with Rust and with JavaScript) and looks up the new details.
I also have it use my own private libraries the same way, and those are not in any training data guaranteed.
I guess if whatever platform/software you use doesn't have tool calling youre kind of right, but also missing something kind of commonplace today.
My theory is that it will not be the case.
New frameworks can be created, but they will be different from before:
- AI-friendly syntax, AI-friendly error handling
- Before being released, we will have to spend hundred of millions of token of agents reading the framework and writing documentation and working example code with it, basically creating the dataset that other AI can reference when using the new framework.
- Create a way to have that documentation/example code easily available for AI agents (via MCP or new paradigm)
Speaking of which, anyone had success using these tools for coding Common Lisp?
Agents no, LLMs yes. Not for generating code per se, but for answering questions. Common Lisp doesn't seem to have a strong influx of n00bs like me, and even though there's pretty excellent documentation, I find it sometimes hard to know what I'm looking for. LLMs definitely helped me a few times by answering my n00b questions I would have otherwise had to ask online.
Joe Marshall had a couple of posts about... No: https://funcall.blogspot.com/2025/05/vibe-coding-common-lisp...
1 reply →
Not CL specifically but works well with Clojure and fits better than non-lisp languages (imo) once you give the LLM direct access to the repl
I've been trying Claude Code with Sonnet 4.0 for a week or so now for Rust code but it feels really underwhelming (and expensive since it's via Bedrock right now). Everytime it's doing something it's missing half despite spending a lot of time planning at the beginning of the session. What am I missing?
Same. I have a very efficient workflow with Cursor Edit/Agent mode where it pretty much one-shots every change or feature I ask it to make. Working inside a CLI is painful, are people just letting Claude Code churn for 10-15 minutes and then reviewing the diff? Are people even reviewing the code?
This sort of asynchronous flow will become more and more mainstream. chatgpt.com/codex, Google's Jules and to a degree Claude Code (even though that's local) are all following that pattern: phrase a goal, send it off to the agent, review the diff and request changes, rinse and repeat until ready for PR review.
For me this only works for fairly tightly scoped tasks that aren't super complex, but it does work. And I think the days of staring down the IDE will be coming to a close for all but the most complex coding tasks in the future.
> Are people even reviewing the code?
No because its boring. Thats why we don't have airplane pilots just watch the machine thats fully on autopilot.
Exact same experience. I have no clue what other people are doing. I was hunting for use cases where it could be used and it kept not working. I don't get it.
Is it only Rust that you've had this experience with or is it a general thing?
5 replies →
it shouldn't be expensive - you can pay for Pro ($20/month) or Max ($100 or $200/month) to get what would cost >> $1000/month in API costs.
Can you use Claude Code with Pro? I was trying to figure this out and I thought you couldn't (unless you enter an API key and pay for tokens).
3 replies →
Yep i know but I have free AWS credits sooo
Nice to see container use mentioned (https://github.com/dagger/container-use). I work with the team that made it (a lot of ex-Docker folks including the creator of Docker.)
Running agents in parallel will be a big deal as soon as we learn (or the agents learn) how to reliably work with just one.
Even before then, if you're trying to get work done while the agent is doing its own thing or you find yourself watching over the agent's "shoulder" out of fear it'll change something you didn't ask it to change, then it's useful to run it in a containerized dev environment.
Container use is definitely early but moving quickly, and probably improved even since this post was published. We're currently focused on stability, reducing git confusion, better human<>agent interaction, and environment control.
My take on choice of language:
1) Java has the largest, oldest and most explicit data set for the LLM to reference, so it's likely to be the most thorough, if not the most correct.
2) Go with the language YOU know best because you'll be able to spot when the LLM is incorrect, flawed in its 'reasoning', hallucinating etc.
> Java has the largest, oldest and most explicit data set for the LLM to reference
That seems to be a recommendation for coding with LLMs that don't have access to tools to look up APIs, docs and 3rd party source-code, rather than something you'd chose for "Agentic Coding".
Once the tooling can automatically figure out what is right, what language you use matters less, as long as source code ends available somewhere the agent can read it when needed.
Agree much with your 2nd point though, all outputs still require careful review and what better language to use than one you know inside-out?
I have been learning Go, Swift, and Rust with the help of LLM/ Agents.
basically the terser/safer syntax and runtime compilation errors are a great tight feedback loop for the agent to fix stuff by itself.
Why is this? is there just a insanely large codebase of open source projects in Java (the only thing i can think of is the entire Apache suite)? Or is it because the docs are that expressive and detailed for a given OSS library?
Java's API docs are very complete and explicit.
Certain points about the language, as well as certain long-existing open source projects have been discussed ad-nauseum online. This all adds to the body of knowledge.
I always assumed the LLMs had the most python code to reference, as they seem to default to Python most often if you don't specify
> Context system: Go provides a capable copy-on-write data bag that explicitly flows through the code execution path, similar to contextvars in Python or .NET's execution context. Its explicit nature greatly simplifies things for AI agents. If the agent needs to pass stuff to any call site, it knows how to do it.
I believe this is considered a bad practice: the general attitude is that the only sane use case for values in context.Context is tracing data, and all other data should he explicitly passed via arguments.
Agreed on all points.
The only place I’ve encountered this pattern is in chromedp, the go wrapper for the chrome headless browser driver. Its API… isn’t good.
Most methods you use are package globals that take a context.Context as a first parameter. But you have to understand that this context is a _special_ one: you can’t pass any old context like context.Background(), you must pass a context you got from one of the factory methods.
If you want to specify a timeout, you use context.WithTimeout. Clever I guess, but that’s the only setting that works like that.
It’s essentially a void*.
I'm really not an expect in Go, but the data that I'm passing at the moment via context is the type of data which is commonly placed there by libraries I use: database connections, config, rate limiters, cache backends etc. Does not seem particularly bad to me at least.
If you use context.Context for this you give up a lot of type safety and generally make your data passing opaque.
It's totally fine to put multiple values into a different data bag type that has explicit, typed fields. For example, the Echo framework has its own strongly typed and extensible Context interface for request scoped data: https://pkg.go.dev/github.com/labstack/echo#Context
4 replies →
"Write the simplest code you can, so the dumb AI can understand it" isn't the massive sell I was expecting.
I wonder how that interacts with his previous post?
https://lucumr.pocoo.org/2025/2/20/ugly-code/
Honestly, I find this approach to be useful pretty much anytime you're working with other people as well.
There are absolutely times to be extremely focused and clever with your code, but they should be rare and tightly tied to your business value.
Most code should be "blindingly obvious" whenever possible.
The limit on developers isn't "characters I can type per minute" it's "concepts I can hold in my head."
The more of those there are... The slower you will move.
Don't create more interfaces over the existing ones, don't abstract early, feel free to duplicate and copy liberally, glue stuff together obviously (even if it's more code, or feels ugly), declare the relevant stuff locally, stick with simple patterns in the docs, don't be clever.
You will write better code. Code shouldn't be pretty, it should be obvious. It should feel boring, because the hard part should be making the product not the codebase.
> This is not an advertisment for Claude Code. It's just the agent I use at the moment. What else is there? Alternatives that are similar in their user experiences are OpenCode, goose, Codex and many others. There is also Devin and Cursor's background agents but they work a bit different in that they run in the cloud.
What do you recommand to get a Claude-code-like experience in the open-source + local llm ecosystem?
> What do you recommand to get a Claude-code-like experience in the open-source + local llm ecosystem?
There is nothing at the moment that I would recommend. However I'm quite convinced that we will see this soon. First of all I quite like where SST's OpenCode is going. The upcoming UX looks really good. Secondly because having that in place, will make it quite easy to put local models in when they get better. The issue really is that there are just not enough good models for tool usage yet. Sonnet is so shockingly good because it was trained for excellent tool usage. Even Gemini does not come close yet.
This is all just a question of time though.
Have you tried aider, and if so, how is it lacking compared to Claude Code in your opinion?
6 replies →
Aider is almost there, in fact it's intentionally "not" there. You can set it up to do things like run test/static analysis automatically and fix errors, and work with it to get a to-do list set up so the entire project is spec'd out, then just keep prompting it with "continue...". It has a hard coded reflection limit of 3 iterations right now, but that can also be hacked to whatever you want. The only thing missing for full agentic behavior is built in self prompting behavior.
> The only thing missing for full agentic behavior is built in self prompting behavior.
Correct me if I'm wrong, but Aider still doesn't do proper tool calling? Last time I tried it, they did it the "old school" way of parsing out unix shell commands from the output text and ran it once the response finished streaming, instead of the sort of tool call/response stuff we have today.
3 replies →
Shameful plug: my upcoming app perhaps?
Single-file download, fuss-free and install-less that runs on mac, windows and linux (+ docker of course.) It can run any model that talks to openai (which is nearly all of them), so it'll work with the big guys' models and of course other ones like ones you run privately or on localhost.
Unlike Claude Code, which is very good, this one runs in your browser with a local app server to do the heavy lifting. A console app could be written to use this self-same server, too, of course (but that's not priority #1) but you do get a lot of nice benefits that you get for free from a browser.
One other advantage, vis-a-vis Armin's blog post, is that this one can "peek" into terminals that you _explicitly_ start through the service.
It's presently in closed alpha, but I want to open it up to more people to use. If you're interested, you and anyone else who is interested can ping me by email -- see my profile.
>run any model that talks to openai (which is nearly all of them)
What does that mean? I've never seen any locally run model talk to OpenAI, how and why would they? Do you mean running an inference server that provides an OpenAI-compatible API?
2 replies →
I see a new alternative (or attempt at one) come out every few days, so it shouldn't be long before we have "the one" alternative.
https://www.app.build/ was just launched by the Neon -- err, Databricks -- team and looks promising.
The Neovim plugin CodeCompanion is currently moving into a more agentic direction, it already supports an auto-submit loop with builtin tools and MCP integration.
Yes it's not a standalone CLI tool, but IMHO I'd rather have a full editor available at all times, especially one that's so hackable and lightweight.
I’m also interested to hear ideas for this.
Gotta say, 100/200 bucks monthly feels prohibitively expensive for even trying out something, particularly something as unproven as code-writing AI, even more particularly when other personal experiences with AI have been at the very least underwhelming, and extra particularly when the whole endeavor is so wrapped up in ethical concerns.
You can use Claude Code either pay-as-you-go with an API key, or subscribe to the $20 Pro subscription.
One month at 20 USD seems like it should be plenty to try it out on a small project or two to decide wether it is worth trying 100 bucks/month? Or one can just wait a couple of months as people report their learnings.
Try Aider with API usage. Learn how to control context size (/clear, /add, /drop). Limit context to 25K. Use whatever model you want (Sonnet 4 or Gemini 2.5 Pro).
For simple scripts, it often costs me under $1 to build. I'm working on a bigger tool these days, and I've done lots of prompts, a decent amount of code, over 100 tests, and my running total is right now under $6.
I'd suggest learn the basics of using AI to code using Aider, and then consider whether you want to try Claude Code (which is likely more powerful, but also more expensive unless you use it all the time).
Yeah I've been using Aider mostly and just started using Codex, very similar to Claude Code, yesterday. Aider is more manual and requires more guiding but it's also an order of magnitude cheaper.
The monkey brain part of me that really doesn't trust an LLM and trusts my decades of hard-won programming experience also prefers using Aider because the usage flow generally goes:
1. Iterate with Aider on a plan
2. Tell Aider to write code
3. Review the code
4. Continue hacking myself until I want to delegate something to an LLM again.
5. Head back to Step 1.
Codex automates this flow significantly but it's also a lot more expensive. Just the little bits of guiding I offer an LLM through Aider can make the whole process a lot cheaper.
It's unclear to me whether the full agentic Claude Code/Codex style approach will win or whether Aider's more carefully guided approach will win in the marketplace of ideas, but as a pretty experienced engineer Aider seems to be the sweet spot between cost, impact, and authorial input.
1 reply →
If you want it really cheap use Deepseek. Its off-peak rate is during the US workday and is like 50 cents per million tokens.
I’ve been using it with Aider and am struggling to even spend my first $5.
https://github.com/dagger/container-use (cu) is improving daily. Happy to help get it working if you're hitting anything (we're all in dagger.io discord). Last night I tried it with Amazon Q Developer CLI chat (with claude-3.7-sonnet), which I hadn't touched before (will PR how-to to the README today). MCP integration just worked for me. Figured out where to put the agent rules for Q and how to restrict to just the tools from cu. I kicked off three instances of Q to modify my flask app project with the same prompt in parallel (don't step on the local source) and got three variants to review in short order. I merged the one I liked into the repo and tossed the rest.
I have seen multiple articles pushing for GO as agentic language of choice; does anyone else feel like this is quite forced? I have tried agentic coding in several languages and I didn't have a particularly good or productive experience with GO.
I don't agree with the author on the GO thing. I've created agents that work 24/7 on GH issues for me, in Rust, Python and PHP. I use Claude (api). The result overall is very good. When I wake up there is always a fresh PR waiting for me.
I don't like the word "agent" because it is not a blind LLM, small or fast script. It is a complex workflow with many checks and prompting before writing a single line of code. That's also the key to AI powered development; context.
> I've created agents that work 24/7 on GH issues for me, in Rust, Python and PHP. I use Claude (api). The result overall is very good.
It's quite possible it's a case of holding things wrong but I think at least the basic evaluation I did that made me come to the conclusion that Go works particularly well isn't too bad. I just get results that I feel good with quicker than with Rust and Python. FWIW I also had really good results with PHP on the level of Go too, it's just overall a stack that does not cater too well to my problem.
All of this is forced, yes. There's probably a trillion dollars riding on this all not imploding, so we're getting it shoved everywhere, all the time, incessantly.
My own experience with "Agents" (and no, I am not a luddite) has been nothing short of comical in how terrible it's been. We try it every single day at our company. We've tried all the advice. All the common wisdom. They have never, not once, produced anything of any value. All the public showcases of these brilliant "agents" have also been nothing short of spectacular failures [1]. Yet despite all this, I keep seeing these type of posts, and pretty much always it's from someone with a vested interest of some kind when you dig deep down enough. All the managerial types pushing it, you look deep enough it's always because the board or investors or whatever other parasite has a vested interest.
I know one thing is for certain, what AI will give us is more and more fucking advertisements shoved into every facet of our lives, except now it sorta talks like a human!
[1] https://news.ycombinator.com/item?id=44050152
it's especially forced if you are making frontend or mobile apps
Or ML apps, since Go doesn't have native GPU or CUDA support. You can do it via CGO but it's far messier than pure Go.
I had a recent discussion with another member of the Python community (OA is written by a big name in Python).
He started off saying "learning to code with AI is like learning to cook by ordering off the menu". I know he meant "an AI being the way you learn how to code", but there's another meaning that I've been thinking a lot about because my 16yo son is really into coding and I'm trying to come up with how I can help him be successful in the world at the horizon where he starts doing it professionally.
In that way, "learning how to work together with an AI to code" is a really, really interesting question. Because the world is going to look VERY different in 2-6 years.
The thread in question: https://bsky.app/profile/alsweigart.bsky.social/post/3lr6guv...
I think this discussion boxes new students into the mediocre category right from the start.
Do we really want to tell Fabrice Bellard that he isn't productive enough?
If you want to train people to become fungible factory workers on the other hand, train them to work on the conveyor belt.
I get your point, but I'm envisioning a different endpoint.
Let's take your factory example: Factories are just a fact of life right now, almost nobody is producing bespoke cars or phones or clothing. So given that my son is basically 100% likely to be working with an automation line, how do I get him on the track to being a machine operator or a millwright rather than doing conveyor belt work?
Elixir looks like agood choice as well, folks have recorded a session building a Phoenix web app with Claude Code and it went quite well for them: https://youtu.be/V2b6QCPgFTk
in the same vein, the closing keynote at this years' ElixirConf EU featured agents building web apps: https://www.youtube.com/watch?v=ojL_VHc4gLk
When I read "Avoid inheritance" in a text about Go, I can't help but get the impression that the text also comes from Claude.
This text is not about go, it's about agentic coding. I have and am using this across different languages. On this project (which is a go backend) I still have TypeScript in the frontend and I have some Python based tasks too. The rules apply universally.
Meta: this hits differently because the author of this post created an awesome, popular Python web framework some 15 years ago. I miss those times dearly (using Python for web stuff).
'Many hallucinations' may become the new 'poorly documented' when it comes to tech stack decisions. I'm asking myself if it could slow down adoption of new tech in future, since it's harder to provide the equivalent learning material of 10 years of Stack Overflow than writing equally good documentation.
Pretty much my experience as well, although I would highly recommend Roo Code + Claude (via the API) to build entire projects, and Claude for "batch" tasks or finalization.
AI models are trained on data that can be 1 or 2 years old. And they're trained of what the saw the most. So, language changes, breaking API changes, dependencies that don't work any more, name changes, etc. are going to get them super confused.
Go indeed works well because of its standard library that avoids the need for many dependencies, and its stability.
I found PHP to actually be the best target language for coding agents. For the same reasons, and also for the ton of documentation and example code available. That doesn't prevent agents from automatically using some modern PHP features, applying static analysis tools, etc.
For frontend stuff, agents will almost always pick React + Tailwind because this is what they saw the most. But Tailwind 4 is very different from Tailwind 3, and that got them super confused.
> Likewise with AI I strongly prefer more code generation over using more dependencies. I wrote about why you should write your own code before, but the more I work with agentic coding, the more I am convinced of this.
This is an interesting statement!
Something I like to do is get Gemini Deepresearch to write a single file manual for any uncommon dependency and include that in my docs/ directory. Helps a bunch.
I also ask it to write specialized guides on narrow topics (e.g., testing async SQLAlchemy using pytest-async and pytest-postgresql).
Packages have communities though.
A suggestion that maybe the dark mode shouldn't be on the end of the page and maybe on the top of the page, I personally would've loved it and do it with some of my html blogs, maybe personal preference but yeah I think I agree golang is pretty cool but the knowledge base of python feels more and I sometimes just use uv with python and ai gemini pro right within the browser to create one off cool scripts. Pretty cool!
Well, the author's previous blog posts were shorter (e.g., https://lucumr.pocoo.org/2022/1/30/unsafe-rust/) and more succinct.
I've no idea what he is saying here. It is all about vaguely defined processes and tools and people increasingly adopt an LLM writing style.
> and people increasingly adopt an LLM writing style.
If you are insinuating that this is written by an LLM: it is not.
No, I didn't try to claim that. I seem to see the influence in many people's writing and verbosity though. It could be as simple as a counter reaction: If an LLM is allowed to be verbose, so are humans. It could also be that people who use LLMs a lot subconsciously adopt the style.
I infer that you are the author of the post. Take it as a compliment, I think you have written many good pre-LLM articles.
Randomly, my advice: don't sleep on this.
Three or four weeks ago I was posting how LLMs were useful for one-off questions but I wouldn't trust them on my codebase. Then I spent my week's holiday messing around on them for some personal projects. I am now a fairly committed Roo user. There are lots of problems, but there is incredible value here.
Try it and see if you're still a hold-out.
I spent a good part of yesterday attempting to use ChatGPT to help me choose an appropriate API gateway. Over and over it suggested things that literally do not exist, and the only reason I could tell was that I spent a good amount of time in the actual documentation. This has been my experience roughly 80% of the time when trying to use an LLM. I would like to know what is the magical prompt engineering technique that makes it stop confidently hallucinating about literally everything.
> I spent a good part of yesterday attempting to use ChatGPT to help me choose an appropriate API gateway.
If you mean the ChatGPT interface, I suspect you're headed in the wrong direction.
Try Aider, with API interface. You can use whatever model you like (as you're paying per token). See my other comment:
https://news.ycombinator.com/item?id=44259900
I mirror the GP's sentiment. My initial attempts using a chat like interface were poor. Then some months ago, due to many HN comments, I decided to give Aider a try. I had put my kid to bed and it was 10:45pm. My goal was "Let me just figure out how to install Aider and play with it for a few minutes - I'll do the real coding tomorrow." 15 minutes later, not only had I installed it, my script was done. There was one bug I had to fix myself. It was production quality code, too.
I was hooked. Even though I was done, I decided to add logging, command line arguments, etc. An hour later, it was a production grade script, with a very nice interface and excellent logging.
Oh, and this was a one-off script. I'll run it once and never again. Now all my one-off scripts have excellent logging, because it's almost free.
There was no going back. For small scripts that I've always wanted to write, AI is the way to go. That script had literally been in my head for years. It was not a challenging task - but it had always been low in my priority list. How many ideas do you have in your head that you'll never get around to because of lack of time. Well, now you can do 5x more of those than you would have without AI.
1 reply →
I'm having a very good experience with ChatGPT at the moment. I'm mostly using it for little tasks where I don't remember the exact library functions. Examples:
"C++ question: how do I get the unqualified local system time and turn into an ISO time string?"
"Python question: how do I serialize a C struct over a TCP socket with asyncio?"
"JS question: how do I dynamically show/hide an HTML element?" (I obviously don't write a lot of JS :-D)
ChatGPT gave me the correct answers on the first try. I have been a sceptic, but I'm now totally sold on AI assisted coding, at least as a replacement for Google and StackOverflow. For me there is no point anymore in wading through all the blog spam and SEO crap just to find a piece of information. Stack Overflow is still occasionally useful, but the writing is on the wall...
EDIT: Important caveat: stay critical! I have been playing around asking ChatGPT more complex questions where I actually know the correct answer resp. where I can immediately spot mistakes. It sometimes gives me answers that would look correct to a non-expert, but are hilariously wrong.
3 replies →
Sure, this was exactly how I felt three weeks ago, and I could have written that comment myself. The agentic approach where it works out it made something up by looking at the errors the type-check generates is what makes the difference.
Which model did you use?
I find using o3 or o4-mini and prompting "use your search tool" works great for having it perform research tasks like this.
I don't trust GPT-4o to run searches.
Did you use search grounding? O3 or o4-mini-high with search grounding (which will usually come on by default with questions like this) are usually the best option.
Did you try giving it the docs to read?
I will definitely sleep on agents. Normal LLM use, fine, but I am not giving up reasoning.
> Normal LLM use, fine, but I am not giving up reasoning.
Ouch! Reminds me of:
- I'm never going to use cell phones. I care about voice quality (me decades ago)
- I'm never going to use VoIP. I care about voice quality (everyone but me 2 decades ago).
- I'm never going to use a calculator. I am not going to give up on reasoning.
- I'm never going to let my kids play with <random other ethnicity>. I care about good manners.
https://en.wikipedia.org/wiki/False_dilemma
3 replies →
this is kind of a weird position to take. you're the captain, you're the person reviewing the code the LLM (agent or not) generates, you're the one asking for the code you want, you're in charge of deciding how much effort to put in to things, and especially which things are most worth your effort.
all this agent stuff sounded stupid to me until I tried it out in the last few weeks, and personally, it's been great - I give a not-that-detailed explanation for what I want, point it at the existing code and get back a patch to review once I'm done making my coffee. sometimes it's fine to just apply, sometimes I don't like a variable name or whatever, sometimes it doesn't fit in with the other stuff so I get it to try again, sometimes (<< 10% of the time) it's crap. the experience is pretty much like being a senior dev with a bunch of very eager juniors who read very fast.
anyway, obviously do whatever you want, but deriding something you've not looked in to isn't a hugely thoughtful process for adapting to a changing world.
2 replies →
What's your definition of "agents" there?
[dead]
Can someone recommended some source for vibe coding eg. how to prompt it properly, what tools to use? Does someone have any experience on anything other than small projects from scratch?
I had the same question because all my experience with this contradicts the hype.
I watched Ronacher's demo from yesterday, https://www.youtube.com/watch?v=sQYXZCUvpIc, and this is it, a well-regarded engineer working on a serious open source project. There's no wizard behind the curtain, it's the thing I've been asking the promoters for.
And you should make your own judgment, but I'm just not impressed.
It seems to me the machine takes longer, creates a plan that "is shit," and then has to be fixed by a person who has a perfect understanding of the problem.
I'm loving LLMs as research tools, pulling details out of bad documentation, fixing my types and dumb SQL syntax errors, and searching my own codebase in natural language.
But if I have to do all the reasoning myself no matter what, setting a robot free to make linguistically probable changes really feels like a net negative.
Thanks for the link.
Given the hype and repercussions of success or failure of what LLMs can hypothetically do, I feel like the only way forward for reasonable understanding of the situation is for people to post live streams of what they're raving about.
Or at the very least source links with version control history.
Agents now are best for writing code, not yet for software engineering.
2 replies →
My reading of the status quo is that people who use it for toy or greenfield projects written from scratch are having a blast. Until the project reaches a certain complexity in size and function when it starts to break down.
People working on existing projects in turn are scratching their heads because it's just not quite working or providing much of a productivity boost. I belong to this camp.
I wrote this a few months ago. The advice still holds but it only has a short section about coding agents so it's less relevant today than it was when I wrote it: https://simonwillison.net/2025/Mar/11/using-llms-for-code/
On the note of language choice, i've been experimenting with Claude Code recently and thought the other day how happy i am to be using Rust with it and how afraid i'd be in Python, JS, etc.
I've noticed Claude Code introduces quite a few errors and then walks through the compile errors to fix things up. Refactors/etc also become quite easy with this workflow from CC.
I'm sure it does well in dynamic languages, but given how much the LLM leans into these compile errors i get the feeling it would simply miss more things if there was none/less.
So far though my #1 concern is finding ways to constraining the LLM. It produces slop really, really quick and when it works more slowly i can avoid some of the review process. Eg i find stubbing out methods and defining the code path i want, in code, rather than trying to explain it to the LLM to be productive.
Still in my infancy of learning this tool though. It feels powerful, but also terrifying in hands of lazy folks just pushing through slop.
anthropic just released ai fluency course
https://www.youtube.com/watch?v=JpGtOfSgR-c
this is best set of vidoes on the topic i've seen
neovim has AI integration! (and a pretty damn good one if I say so myself (I wrote it))
https://github.com/dlants/magenta.nvim
Prompt engireering is a deep art.
[flagged]