Comment by mohsen1

6 days ago

Very impressive that they could do this so quickly because I have been on a similar project (porting TypeScript to Rust) for 5 months. But I guess I don't have access to Mythos and unlimited tokens. I'm also close to 100% pass rate. 99.6% at the time of writing.

https://tsz.dev

Rust is perfect for writing all of code using LLM. It's strict type system makes is less likely to make very dumb mistakes that other languages might allow.

Also want to note that writing the code using LLM doesn't remove the need to have a vision for the design and tradeoffs you make as you build a project. So Jarred and his team are the right kind of people to be able to leverage LLMs to write huge amounts of code.

> Rust is perfect for writing all of code using LLM. It's strict type system makes is less likely to make very dumb mistakes that other languages might allow.

I question this. Yes, strong enforcement of invariants at compile time helps the LLM generate functional code since it gets rapid feedback and retraces as opposed to generating buggy code that fails at runtime in edge cases.

On the other hand, Rust is a complex language prone to refactoring avalanches, where a small change in a component forces refactoring distant code. If the initial architecture is bad or lacking, growing the code base incrementally as LLMs typically do will tend towards spaghettification. So I fear a program that compiles and even runs ok, but no longer human readable or maintainable.

  • > Rust is a complex language prone to refactoring avalanches

    This may be so, but LLMs are great at slogging through such tedious repercussions.

    I would say if the language prevents sloppy intermediate states, that actually makes it more amenable to AI; if you just half-ass a refactor into a conceptually inconsistent state, it’s possible for bad tests to fail to catch it in Python, say. But if many such incomplete states are just forbidden, then the compiler errors provide a clean objective function that the LLM can keep iterating on.

    • This is true in my experience as well. I'd even say it's the most common failure mode of current AI! It "fixes" some problem locally and declares victory, but it doesn't fully address the consequences of the change everywhere, and then the codebase is inconsistent.

      1 reply →

  • > On the other hand, Rust is a complex language prone to refactoring avalanches, where a small change in a component forces refactoring distant code.

    Are you saying this out of personal experience or just hypothesizing? I am working on a large, complex rust project with Claude Code and do not experience this at all.

    • It can happen like this:

      - write sleek operator-overloading-based code for simple mathematical operations on your custom pet algebra

      - decide that you want to turn it into an autograd library [0]

      - realise that you now need either `RefCell` for interior mutability, or arenas to save the computation graph and local gradients

      - realise that `RefCell` puts borrow checks on the runtime path and can panic if you get aliasing wrong

      - realise that plain arenas cannot use your sleek operator-overloaded expressions, since `a + b` has no access to the arena, so you need to rewrite them as `tape.sum(node_a, node_b)`

      - cry

      This was my introduction to why you kinda need to know what you will end up building with Rust, or suffer the cascade refactors. In Python, for example, this issue mostly wouldn't happen, since objects are already reference-like, so the tape/graph can stay implicit and you just chug along.

      I still prefer Rust, just that these refactor cascades will happen. But they are mechanically doable, because you just need to 'break' one type, and let an LLM correct the fallout errors surfaced by the compiler till you reach a consistent new ownership model, and I suppose this is common enough that LLM saw it being done hundreds of times, haha.

      [0] https://github.com/karpathy/micrograd

      2 replies →

    • I also work on a large complex rust project (>1M LOC) with extensive use of Claude Code. It is very consistent with my experience. Claude frequently subverts the obvious intent of the system - whether that's expressed in comments or types - in the pursuit of "making the build green", as it so often puts it. It, like many junior engineers, has completely failed to internalize the lesson that type errors are useful information and not a bad thing to make go away as soon as possible. It is remarkably capable, but you cannot trust it to have good taste.

  • It's very easy to just instruct the LLM to build using isolated crates, to maintain boundaries, focus on "ports and adapters", etc, and not run into this - in my experience.

    I haven't had any issues with this getting out of hand on >10KLOC vibed rust codebases.

    • From the languages that I know, Rust is the only language that I can look at a multi-threaded code and understand it. This stuff being checked by the compiler is a huge advantage

      1 reply →

    • > I haven't had any issues with this getting out of hand on >10KLOC vibed rust codebases.

      This rewrite is >750k lines of Rust

      1 reply →

  • Sure, but if the initial architecture is bad for most mainstream languages, trying to do a huge cascading refactor is equally hard, but at the end the result is a lot less likely to work, so you don't so it at all and end up in the same spaghetti mess.

    The lesson here is that right now LLMs are a lot better at "fill in the implementation for this API I defined" than "design everything from scratch" if you care at all about whether it becomes a mess of spaghetti. Maybe someday they'll be better at it, but at least today, you have to choose between going full vibes and not caring about the code, or you need to be involved in the design, and either way it's not clear that Rust is a significantly worse choice based on anything other than your own experience.

When Microsoft rewrote it in go, there was a comment from one of the leads that they chose it over rust because of the similarity in paradigms (garbage collection, etc), and that using rust would've been more difficult, requiring a lot of "hoop jumping". Now that you've done it... Thoughts?

  • Yes indeed. More than 1 million lines of code (including tests) is jumping lots of hoops but with LLMs it's not as painful so you can just ask it to do the hard things.

    Example of a Claude Code session after 2 hours of "Crunching" that came out without results https://github.com/mohsen1/tsz/pull/4868 (Edit I force pushed to PR to solve the problem, you can see the initial refuse message in the initial version of PR description)

    Funny thing is, the last percent of the test have been so hard to work on that Opus 4.7 routinely bails and says "it's too involved or complicated" so I had to add prompts specifically asking it not to bail.

    • You should try GPT, I’d be really interested to hear if it works better. (Exclusively using GPT for systems work at $DAYJOB, but compare with opus every couple weeks and GPT consistently gives me better results)

      4 replies →

    • That might be opus 4.7 behaviour because I also get that all the time in the past few weeks. Also complex code base, but likely an order of magnitude simpler than yours.

  • They mentioned that they wanted to port their compiler over to retain existing behavior (vs a re-write) and Rust has a hard time with their cyclic data structures.

  • Is GC useful for a static type checker? Or did they make a new runtime?

    • The point is that having a GC will affect your data structure and algorithm design, so it’s easier to automatically transform JS or TS to Go than to rust because you’re mostly reducing things down to one problem (translation) rather than multiple intertwined problems.

    • tyscript compiler is a cli tool. and is run for short periods of time. GC collection and memory leaks should be least of issue to look for

Same but for multi-threaded Postgres[0]. 96% pg regression tests pass after 1 month and 823K LOC. 8 Codex accounts at $200/mo is what i could use up with no Mythos

I've also seen the benefits of Rust for this too. And making the bet that my pg experience will help me make good design choices around many of the things people have been having trouble with in pg for a long time[1]. Excited to see AI make it more possible to improve complex pieces of software than has historically been practical.

[0] https://github.com/malisper/pgrust [1] https://malisper.me/the-four-horsemen-behind-thousands-of-po...

  • Very cool! If you have extra tokens laying around ask the agent try to break things and open GitHub issues. This is what I do for tsz and beyond conformance test I can see it finding very good bugs.

  • 96% tests passing sounds impressive, but I remember that C compiler that had similar (or better) stats yet was still hilariously broken because the test suite didn't cover many "obvious" things that a human wouldn't get wrong even without the tests.

    • There's a few big differences between the Anthropic C compiler and pgrust. The C compiler was built mostly autonomously and as a clean room implementation. OTOH I'm steering codex and using the Postgres source code as a reference. That's leading to the implementation being based more on how pg does things than anything else. If you want to try it out, I compiled it to wasm so you can try it out here[0]. You'll see it's much more faithful to Postgres than a C compiler that doesn't handle type checks.

      [0] https://pgrust.com/

  • > PostgreSQL, rewritten from scratch in Rust.

    You use the test suite and LLMs are trained on Postgres.

    Are you at Freshpaint? A company that "helps healthcare marketing teams grow in a world where privacy is the baseline, but performance is the goal."

    Nice promises! Surely the marketing teams will respect privacy!

  • wow!

    curious about your workflow for running all these accounts. different harnesses in parallel? manually switching in codex? 5.5pro only?

    what works for you?

Rust is amazing, but the way I want to build Rust software breaks down on large projects with LLMs. Maintaining clean boundaries or even just establishing them stops being a flow state and turns into painful reviews that push me into procrastination mode.

I’ve struggled to get Opus to not write the weirdest possible Rust, ignoring all idioms and so on. Any tips?

  • I found that turning on every possible clippy lint and telling it that it has to run clippy as well as tests before it can claim it's finished helped a decent amount. Of course, if you have a decent-sized codebase of Rust you're happy with, it helps immensely, since it will tend to listen to follow instructions to follow existing patterns.

  • Be absolutely ruthless with technical debt. Opus is perfectly capable of producing idiomatic code in any mainstream language you please, but will seize on any opportunity to justify writing basically-python instead because that's "consistent" with the "convention". Deprive it of that excuse.

    • Yeah that’s basically what I mean! I have no issues wrangling it myself, but now I’m curious how those who are managing “fleets” of agents while shipping four features a day are doing it. They’re not, I’d assume?

  • Give it coding guidelines. It'll largely try to do what you ask.

    Left to itself, it often follows human developers who conceive of their goal as "get the program working, the end justifies the means." Which makes sense because there are a lot of systems like that in the training corpus.

>Rust is perfect for writing all of code using LLM. It's strict type system makes is less likely to make very dumb mistakes that other languages might allow.

100%. I've been telling everyone who will listen this for 2 years. LLMs are infinitely more productive with swift code like

let engineCycleCount: Int = 5

vs

let eC = 5

They still make mistakes, but forcing _explicit_ typing in a strongly typed language makes them make far fewer mistakes + the compiler is catching >90% of what you try to catch with a billion rspecs in trash languages like ruby.

Wow, amazing work.

Pretty impressive that it is faster than the Go version already.

  • Thank you!

    It's much faster in single file benchmarks (3 to 5x)

    https://tsz.dev/benchmarks/micro

    I have optimizations planned for large projects that I'm still flushing out.

    • Regarding the architecture documentation you have up on tsz.dev, one thing that jumped out to me was the use of the per node typed side pools. A semi-recent talk[0] had benchmarked this and found it to be a deoptimisation: he couldn't explain it, but an audience member suggested it is likely because an AST is not generally very type-homogenous in its visit order. After a CallExpr node the next node to visit is probably not a CallExpr but more probably an Identifier etc, so storing the node "extra data" in separate pools makes them more likely to be cold in cache rather than hot.

      In Nova JavaScript engine[1] I've done exactly as you've done and split objects into typed side pools (I call them "(typed) heap vectors") but in a JavaScript engine my _hypothesis_ is that the visitation patterns are much more amenable to this: an Array, Set, or Map is more likely to be homogeneous than heterogeneous, and therefore a loop over the contents is likely going to hit the same side pool for each entry.

      [0]: https://www.youtube.com/watch?v=s_1OG9GwyOw [1]: https://trynova.dev/

      1 reply →

Zig is much more type aligned to bun than typescript. And there’s a common interface of C ffi so you could imagine porting it modularly and keeping the test suite in zig

>Rust is perfect for writing all of code using LLM.

Rust is a terrible language for using LLMs to write code if Rust's low latency isn't needed, because of its extreme compile times. LLMs code faster than humans so a far bigger fraction of the time is spent waiting for the compiler, and a reasonably sized project will take literally 10x longer to compile in Rust than in e.g. Zig or Go.

  • In my experience with Claude Code, it writes most of the code, including tests, without invoking the compiler until the very end (almost like a spelling checker). Rarely are there any compilation problems, and when there are, it’s often a token issue like a missing brace. I hypothesize this is possible because of the robust invariants of the language itself, and its strong types, such that the LLM can encode deeper meaning in fewer tokens.

    Also remember, `cargo check` is quite fast, and wholly sufficient for confirming correctness.

shouldn't typed code that uses functional style be kinda the perfect end game for llms? You can parallelize generation at any granularity, easily ring fence changes, reproduce everything, types give clues to the llm.

Interesting, but why not then use an even stricter language? Say Idris, ATS, Lean or F* ?

  • Not OP. For this particular use case, I think performance is a primary concern.

    But if you mean in general, I also totally feel that languages that let you represent more invariants statically are better fit for LLMs. I'd love to see experimentation with LLMs with dependent types and managed effects.

[flagged]

  • > How do we know it is true?

    The branch is open.

    You can check it out and run the tests if you don’t believe it.

  • Zig isn’t so much on the blacklist because of the culture it carries from its maintainers, but because the ecosystem is no longer easily composed with other GitHub projects/GitHub Actions.

  • > We are dealing with a company of habitual liars and promoters.

    Any sources to back this up?