Comment by bensyverson
9 hours ago
I built an agent with Go for the exact reasons laid out in the article, but did consider Rust. I would prefer it to be Rust actually. But the #1 reason I chose Go is token efficiency. My intuitive sense was that the LLM would have to spent a lot of time reasoning about lifetimes, interpreting and fixing compiler warnings, etc.
I've built tools with both Go and Rust as LLM experiments, and it is a real advantage for Go that the test/compile cycle is much faster.
I've been successful with each, I think there's positives and negatives to both, just wanted to mention that particular one that stands out as making it relatively more pleasant to work with.
"LLM would have to spend a lot of time reasoning about lifetimes"
Let's set aside the fact that Go is a garbage collected language while Rust is not for now...
Do you prefer to let LLM reason about lifetimes, or debugging subtle errors yourself at runtime, like what happens with C++?
People who are familiar with the C++ safety discussion understand that lifetimes are like types -- they are part of the code and are just as important as the real logic. You cannot be ambiguous about lifetimes yet be crystal clear about the program's intended behavior.
For many (most) types of objects lifetimes can be a runtime property just fine. For e.g. a list, in rust/c/c++ you would have to do an explicit decision how long should it be "alive", meanwhile a managed language's assumption that when it's reachable that is its lifetime is completely correct and it has the benefit of fluidly adapting to future code changes, lessening maintenance costs.
Of course there are types where this is not true (file handlers, connections, etc), and managed languages usually don't have as good features to deal with these as CPP/Rust (raii).
Fair point, and it depends on whether you're building code to last a decade, or creating a quick proof of concept.
It's not a waste of time though. Those warnings and clippy lints are there to improve the quality of the code and to find bugs.
As a human I can just decide to write quality code (or not!), but LLMs don't understand when they're being lazy or stupid and so need to have that knowledge imposed on them by an external reviewer. Static analysis is cheap, and more importantly it's automatic. The alternative is to spend more time doing code review, but that's a bottleneck.
I've never actually seen it get a compiler issue arising from lifetimes, so it seems to one-shot that stuff just fine. Although my work is typically middle of the road, non-HFT trading applications, not super low-level.
It certainly had to iterate on lifetimes prior to Claude 4.5, at least for me. Prior to Claude 4.0 it was pretty bad at Rust.
Most LLM sucked at Rust at the beginning because there's much less rust code available on the broad internet.
I suspect the providers started training specifically in it because it appeared proportionally much more in the actual LLM usage (obviously much less than more mainstream languages like Python or JavaScript, but I wouldn't be surprised if there was more LLM queries on Rust than on C, for demographic reasons).
Nowadays even small Qwens are decent at it in one-shot prompts, or at least much better than GPT-4 was.
That matches with actual Rust use actually, I've worked with Rust since 2017 on multiple projects and the number of times I've used the lifetime annotation has been very limited.
It's actually rare to have to borrow something and keep the borrow in another object (is where lifetime happens), most (95% at least I'd say) of the time you borrow something and then drop the borrow, or move the thing.
Yes, I basically do everything the lazy/thoughtless way for a first pass. I find in 99% of cases that's already performant enough and matches the intended data flow, but if you ever want to optimize it, you can. The same is also true with the types: you can bash out a prototype very quickly and then tighten them up later, using Clippy to easily find all the shortcuts you took.
LLMs don't "reason".
Why is this a meaningful distinction to you? What does "reason" mean here? Can we construct a test that cleanly splits what humans do from what LLMs do?
Sure, things like counting the ‘r’s in strawberry, for example (till they are retrained not to make that mistake).
3 replies →
Take it up with OpenAI's API designers—it's their term