← Back to context

Comment by gm678

19 hours ago

    $ rg 'unsafe [{]' src/ | wc -l
    10428
    $ rg 'unsafe [{]' src/ -l | wc -l
    736
    
    Language        Files     Lines      Code  Comments    Blanks
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
    Rust             1443    929213    732281    116293     80639
    Zig              1298    711112    574563     59118     77431
    TypeScript       2604    654684    510464     82254     61966
    JavaScript       4370    364928    293211     36108     35609
    C                 111    305123    205875     79077     20171
    C++               586    262475    217111     19004     26360
    C Header          779    100979     57715     29459     13805

Cool you can just search specifically for potentially unsafe code in Rust. How do you search for unsafe code in Zig? Or do you just have to assume it's everywhere?

  • If half of your code is unsafe then unless you exercise tremendous discipline (Claude basically doesn't) you will just end up with a big ball of unsafe, peppered with hallucinations in whatever random documentary comments Claude decided to make. I doubt they enforced the confinement of unsafe to a specific architectural layer or anything like that.

    • Aren't the Rust unsafes a reflection of the Zig it was ported from? However now that you're working with Rust, you're in a position to continue improving and eliminating the unsafes.

      5 replies →

  • There is a qualitative difference between unsafe Rust and Zig as far as I know.

  • if half of your files in a million line codebase are unsafe that doesn't tell you much any more. Presumably the point of a Rust rewrite is that you actually make use of Rust's safety features in a coherent way.

    But given the whole "let AI rewrite this for me" stunt nature of this project that was not going to happen because that would require well, actual thinking and a re-design. So now you have Zig disguised as Rust and a line-by-line port because the semantics of idiomatic Rust don't map on the semantics of Zig.

    • >if half of your files in a million line codebase are unsafe that doesn't tell you much any more.

      If half of your files in the first pass of a million line rewrite are unsafe then that's completely fine. Do you understand what the tag actually is? It doesn't even mean that the code is actually unsafe, just that the compiler can't guarantee its safety, which can happen for a number of reasons, some benign.

      Who rewrites a 700K codebase trying to be idiomatic from the get go ? That's setting yourself up for failure, whether you're a human or a machine.

      1 reply →

    • And? This is absolutely the correct and standardized way to do mechanical rewrites: you do a rewrite that maps directly to the original source so you can rely on the original correctness guarantees and bug-for-bug compatibility and log issues, and then you go into the next phase where you begin to use idiomatic constructs.

      This is the same in COBOL-to-Java ports that have been done in banking and insurance for the past 20 years.

      12 replies →

  • It's worth pointing out that "unsafe" in rust is not a very sound concept - it's not like a monad or "function colour" whereby the compiler can say "this code ultimately calls unsafe". It's more like a comment on steroids; you call unsafe in a function, write a comment about it, and no caller of that function would have any idea that it's calling unsafe code.

    • Yes, the point of unsafe is that you promise it's safe, you promise to preserve the necessary invariants to make it safe to call no matter from where. It was never supposed to "taint" all code that calls it, that would defeat its purpose. It's sound enough, it's just not at all trying to do that.

      1 reply →

The half of the files contain 'unsafe' keyword? It doesn't seem as a good rewrite. What is the point of rewrite into Rust, if ~half of your code is still unsafe?

  • Bun is fundamentally a boundary-heavy system and it also rolls its own version of a lot of things that people typically use via libraries, where unsafe is hidden. (no async, memory arenas, etc). It also uses FFI heavily which requires unsafe.

    It also looks like the top 2 maintainers are currently actively working on getting the amount of unsafe down and it's going down quickly.

    • If the unsafe can be iteratively removed and the final code is of reasonable quality that seems like a sane strategy. Any large migration just needs to be doable incrementally so progress can be made.

  • 1. Rewrite from zig to rust in as close to zig as you can.

    2. Turn into idiomatic rust.

    • 1. Get hired into a company where you have a solid bet on making multi-century lasting generational wealth (>$50,000,000).

      2. Every waking moment do everything in your power to boost the company that might give you the ability to define the direction of technology for the rest of your life.

      3. Use the only thing you have (bun) to help push you in this direction and do things to help boost LLM marketing (a technology that already deeply struggles to find customers and has to rely on welfare (lucrative government contracts) to make sales).

      ---

      Honestly think this generation of tech workers in SF are more evil than those that worked at Google + Facebook in the early 10s.

      12 replies →

  • > What is the point of rewrite

    To win a news cycle.

    For the forseeable future, the AI market competition is not about which product can provide the most valuable utility to users. It's about which product can be holding the protective aura of social media and investment zeitgeist while competitors buckle under the strain from unfulfilled hype and over-leveraging.

    Utility, engineering, efficiency... these are all menial details for the winners to reluctantly iron out in 2035.

  • unsafe just means that you take responsibility for the safety of the code contained within. Calling into non-Rust libraries has to be wrapped in unsafe. Making syscalls has to be wrapped in unsafe.

    Bun needs to interact with FFI code. This gets wrapped in unsafe blocks.

    There are many places where a JavaScript interpreter and library would need to make unsafe calls and operations.

    It doesn't literally mean the code is unsafe. It means the code contained within is not something that can be checked by the compiler, so the writer takes responsibility for it.

    There are many low-level data munging and other benign operations that a human can demonstrate are safe, but need to be wrapped in safe because they do things outside of what the compiler can check.

    • There's actually a good example of this in the rewrite [1], in `PathString::slice`. They are doing an unsafe operation to return a slice that could be a use-after-free, if the caller had not already guaranteed that an invariant will remain true. Following proper rust idiomatic practices, claude has added a SAFETY comment to the unsafe block to explain why it's safe: "caller guarantees the borrowed memory outlives this".

      Now, normally, you'd communicate this contract to your API users by marking the type's constructor (PathString::init) as "unsafe", and including the contract in its documentation. Unfortunately in this case, this invariant does not exist - it appears to have been fabricated out of thin air by the LLM [2]. So, not only does this particular codebase have UB problems caused by unsafe code, the SAFETY blocks for the unsafe code are also, well, lies.

      [1] https://github.com/oven-sh/bun/blob/63035b3e37/src/bun_core/...

      [2] https://github.com/oven-sh/bun/blob/63035b3e37/src/bun_core/...

      9 replies →

    • > unsafe just means that you take responsibility for the safety of the code contained within.

      In this case it means you delegated the responsibility to a notably flaky heuristic.

  • Some correct me if I'm wrong, but it's unlikely they wrote this first initial version of Rust and will leave it unchanged as-is. What's there now is a step in a long process, not the final destination.

  • Rust has a ton of other features besides safe. Like exhaustive checking of enum variants and the ability to avoid using null with option and result.

    • Zig has these modern language features too fwiw.

      I think the goal was to do a massive rewrite for Anthropic (they acquired bun) and show that rewriting projects from lang -> lang with Claude can reduce security vulnerabilities to help with the hype for an IPO.

      I don’t use/know Rust so I can’t comment on the quality, but there was a public security review that found issues with the new Rust code: https://x.com/SwivalAgent/status/2054468328119279923

      This is an interesting experiment but I’m skeptical of any claims of success by Jarred/Anthropic due to the incentive to hype agents. There’s probably a trillion dollars at stake with the IPO. And Anthropic seems to be developing this part of their business with Mythos and the super review features.

      But I’d like to see the same experiment done on a project without so much relying on the story being success.

      2 replies →

  • that sounds like a starting point and an honest translation. If it was originally unsafe and suddenly becomes safe immediately after the rewrite, it would mean they break existing behaviors

Better to know where memory bugs may happen than them being everywhere. Also, bun team are looking it to reduce it by a large margin. Since it was a line by line port, there is a good space for improvement. By first rust release, a significant number of it should be resolved.

  • Wouldn't it be better to port more idiomatically? Otherwise, you've done nothing but port all the existing bugs while creating new ones.

    • That's one problem with LLM's. I had claude write a function in python for me that did a bit of math, because, like most programmers, I don't know math.

      The function worked perfectly mathematically speaking, but after a bit of research I realized a human being would never write a piece of code so bad.

      I don't remember exactly, but it looked like this:

          denominators = [...]
      
          def lcm(a, b):
              return abs(a * b) // math.gcd(a, b)
          
          return reduce(lcm, denominators)
      

      There are 2 problems with this code.

      First, that is the correct way to calculate the LCM that you'll quickly learn if you google it (or if you ask claude). The problem: math.lcm already exists! Any human being writing this would have paused to think "wait, Python has math.gcd, does it have math.lcm as well?" And then they would have just used that.

      Second, you don't even need reduce. You can just math.lcm(*denominators). A human being would have realized this when intellisense showed it takes any number of arguments instead of just 2.

      Pretty much every time I used an LLM to generate code it generates a rough draft barely held together that needs to be completely rewritten later. With Qt for example it generated 2 push buttons for Ok/Cancel when there is QDialogButtonBox for this that even orders the buttons to match the typical system order, or when generating a combo box that associated labels with objects it tried to figure out which object from the text of the label of the items when there is already a way to just set an arbitrary object for each item and then get it later with .currentData().

      Every single time it makes me think: yes, this works. But no, not like this.

      I can't imagine with 1 million lines of this feels like.