← Back to context

Comment by sunrunner

17 hours ago

Ignoring things like whether the Rust that was output could be deemed qualitatively good, whether the resulting line count is appropriate, how much the codebase was ready or primed for this kind of exercise going in, and so on, is it fair to say that a 622 line artefact created up front is a relatively small cost for a potential increase in consistency or quality of output when the output is ~1M LoC? It seems like there's a multiplicative power here given how much output there is. Or is that missing a lot of nuance?

I'd also be interested generally in how much tacit knowledge was needed to come up with these rules and how much iteration on this file was needed, for example how many of the rules here came from a failure case hit as part of iterating on the translation.

This is effectively a very expensive and resource-intensive machine translation. As such, there is no increase in consistency or quality of output.

  • The translation is a starting point to enable follow-on work to take advantage of Rust's features.

  • How would you have achieved this “machine translation” without an LLM?

    It seems to me it would have been highly likely to be more expensive and more resource intensive - if realistically possible at all, short of implementing a general Zig to Rust translator first.

    • "Short of..." indeed. You already know the answer, although it doesn't need to be general; it only needs to work on a single codebase.

      A recent and highly relevant example is the migration of the TypeScript compiler to Go. They did not use an LLM to translate the code. Instead, they used LLM assistance to write a deterministic TypeScript-to-Go translator and then used that to translate the code. I have far more confidence in this approach than in letting the LLMs rip on the translation itself.

      2 replies →

    • Even before the advent of LLMs, I have personally (and largely successfully) translated several production systems from one language to another. I've learned it's best to start with a mechanical translation, literally bug for bug, leaving shit exactly as I found it (just in another language.)

      I've done Perl to Java, Java to Kotlin, Python to Ruby, Ruby to Java, C to Swift, you name it.

      It's only when you change behavior during the rewrite that it becomes an intractable problem. If you ship a 1:1 translation, THEN you can start going through the list of "bugs" you found along the way. Tread carefully when it comes to this, however, as I can almost guarantee that within your non-trivial codebase there will be some code that implicitly _depends_ on a "bug" to function at all. This where shit hits the fan.

      2 replies →

> I'd also be interested generally in how much tacit knowledge was needed to come up with these rules and how much iteration on this file was needed, for example how many of the rules here came from a failure case hit as part of iterating on the translation.

I think that's the point the original poster was making. There's basically zero chance this file was just spit out by memory in an afternoon. It was obviously the result of a LOT of pre-planning and back and forth checking over the artifacts that Claude was incorrectly generating for one reason or another. So yeah, an extremely iterative process.

With rules as fine-grained as these, there was almost certainly many instances where hundreds of files are generated -> one particular file doesn't translate <X> correctly -> add a rule for <X> -> regenerate everything again -> crap, that rule broke a different file because <Y> -> add a rule for <X if Y>, another for <X not Y> -> regenerate everything again[0] -> repeat. The token costs must have been out of this world.

0: now I'm sure people will say "why would you regenerate a file that generated correctly once? Just mark it off the list and move on." Well, when essentially 99.9999% of your codebase is generated artifacts, the tiny fraction that is actually human-understandable is now the spec, the source of truth for everything. It HAS to be able to essentially redo the entire process if you expect any level of maintainability going forward.

I would guess it was a for ... each loop. They likely wrote a bunch of skills. The foor loop went through each file and generated a complimentary file, then had another process integrate/validate.

I doubt the entire process was a single week, just whatever harness they specially prepared for the work.