Comment by gm678
19 hours ago
$ rg 'unsafe [{]' src/ | wc -l
10428
$ rg 'unsafe [{]' src/ -l | wc -l
736
Language Files Lines Code Comments Blanks
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Rust 1443 929213 732281 116293 80639
Zig 1298 711112 574563 59118 77431
TypeScript 2604 654684 510464 82254 61966
JavaScript 4370 364928 293211 36108 35609
C 111 305123 205875 79077 20171
C++ 586 262475 217111 19004 26360
C Header 779 100979 57715 29459 13805
Cool you can just search specifically for potentially unsafe code in Rust. How do you search for unsafe code in Zig? Or do you just have to assume it's everywhere?
If half of your code is unsafe then unless you exercise tremendous discipline (Claude basically doesn't) you will just end up with a big ball of unsafe, peppered with hallucinations in whatever random documentary comments Claude decided to make. I doubt they enforced the confinement of unsafe to a specific architectural layer or anything like that.
Aren't the Rust unsafes a reflection of the Zig it was ported from? However now that you're working with Rust, you're in a position to continue improving and eliminating the unsafes.
5 replies →
There is a qualitative difference between unsafe Rust and Zig as far as I know.
In principle static analysis is possible. (Note: WIP)
https://github.com/ityonemo/clr
if half of your files in a million line codebase are unsafe that doesn't tell you much any more. Presumably the point of a Rust rewrite is that you actually make use of Rust's safety features in a coherent way.
But given the whole "let AI rewrite this for me" stunt nature of this project that was not going to happen because that would require well, actual thinking and a re-design. So now you have Zig disguised as Rust and a line-by-line port because the semantics of idiomatic Rust don't map on the semantics of Zig.
>if half of your files in a million line codebase are unsafe that doesn't tell you much any more.
If half of your files in the first pass of a million line rewrite are unsafe then that's completely fine. Do you understand what the tag actually is? It doesn't even mean that the code is actually unsafe, just that the compiler can't guarantee its safety, which can happen for a number of reasons, some benign.
Who rewrites a 700K codebase trying to be idiomatic from the get go ? That's setting yourself up for failure, whether you're a human or a machine.
1 reply →
And? This is absolutely the correct and standardized way to do mechanical rewrites: you do a rewrite that maps directly to the original source so you can rely on the original correctness guarantees and bug-for-bug compatibility and log issues, and then you go into the next phase where you begin to use idiomatic constructs.
This is the same in COBOL-to-Java ports that have been done in banking and insurance for the past 20 years.
12 replies →
[dead]
It's worth pointing out that "unsafe" in rust is not a very sound concept - it's not like a monad or "function colour" whereby the compiler can say "this code ultimately calls unsafe". It's more like a comment on steroids; you call unsafe in a function, write a comment about it, and no caller of that function would have any idea that it's calling unsafe code.
Yes, the point of unsafe is that you promise it's safe, you promise to preserve the necessary invariants to make it safe to call no matter from where. It was never supposed to "taint" all code that calls it, that would defeat its purpose. It's sound enough, it's just not at all trying to do that.
1 reply →
The half of the files contain 'unsafe' keyword? It doesn't seem as a good rewrite. What is the point of rewrite into Rust, if ~half of your code is still unsafe?
Bun is fundamentally a boundary-heavy system and it also rolls its own version of a lot of things that people typically use via libraries, where unsafe is hidden. (no async, memory arenas, etc). It also uses FFI heavily which requires unsafe.
It also looks like the top 2 maintainers are currently actively working on getting the amount of unsafe down and it's going down quickly.
If the unsafe can be iteratively removed and the final code is of reasonable quality that seems like a sane strategy. Any large migration just needs to be doable incrementally so progress can be made.
1. Rewrite from zig to rust in as close to zig as you can.
2. Turn into idiomatic rust.
1. Get hired into a company where you have a solid bet on making multi-century lasting generational wealth (>$50,000,000).
2. Every waking moment do everything in your power to boost the company that might give you the ability to define the direction of technology for the rest of your life.
3. Use the only thing you have (bun) to help push you in this direction and do things to help boost LLM marketing (a technology that already deeply struggles to find customers and has to rely on welfare (lucrative government contracts) to make sales).
---
Honestly think this generation of tech workers in SF are more evil than those that worked at Google + Facebook in the early 10s.
12 replies →
> What is the point of rewrite
To win a news cycle.
For the forseeable future, the AI market competition is not about which product can provide the most valuable utility to users. It's about which product can be holding the protective aura of social media and investment zeitgeist while competitors buckle under the strain from unfulfilled hype and over-leveraging.
Utility, engineering, efficiency... these are all menial details for the winners to reluctantly iron out in 2035.
Bannon’s ‘flood the zone’ strategy applied to AI.
unsafe just means that you take responsibility for the safety of the code contained within. Calling into non-Rust libraries has to be wrapped in unsafe. Making syscalls has to be wrapped in unsafe.
Bun needs to interact with FFI code. This gets wrapped in unsafe blocks.
There are many places where a JavaScript interpreter and library would need to make unsafe calls and operations.
It doesn't literally mean the code is unsafe. It means the code contained within is not something that can be checked by the compiler, so the writer takes responsibility for it.
There are many low-level data munging and other benign operations that a human can demonstrate are safe, but need to be wrapped in safe because they do things outside of what the compiler can check.
There's actually a good example of this in the rewrite [1], in `PathString::slice`. They are doing an unsafe operation to return a slice that could be a use-after-free, if the caller had not already guaranteed that an invariant will remain true. Following proper rust idiomatic practices, claude has added a SAFETY comment to the unsafe block to explain why it's safe: "caller guarantees the borrowed memory outlives this".
Now, normally, you'd communicate this contract to your API users by marking the type's constructor (PathString::init) as "unsafe", and including the contract in its documentation. Unfortunately in this case, this invariant does not exist - it appears to have been fabricated out of thin air by the LLM [2]. So, not only does this particular codebase have UB problems caused by unsafe code, the SAFETY blocks for the unsafe code are also, well, lies.
[1] https://github.com/oven-sh/bun/blob/63035b3e37/src/bun_core/...
[2] https://github.com/oven-sh/bun/blob/63035b3e37/src/bun_core/...
9 replies →
> unsafe just means that you take responsibility for the safety of the code contained within.
In this case it means you delegated the responsibility to a notably flaky heuristic.
> a JavaScript interpreter
Bun is not a Javascript interpreter. But I do see the point.
Some correct me if I'm wrong, but it's unlikely they wrote this first initial version of Rust and will leave it unchanged as-is. What's there now is a step in a long process, not the final destination.
The point is to serve as marketing for Claude. Absolutely nothing else.
Rust has a ton of other features besides safe. Like exhaustive checking of enum variants and the ability to avoid using null with option and result.
Zig has these modern language features too fwiw.
I think the goal was to do a massive rewrite for Anthropic (they acquired bun) and show that rewriting projects from lang -> lang with Claude can reduce security vulnerabilities to help with the hype for an IPO.
I don’t use/know Rust so I can’t comment on the quality, but there was a public security review that found issues with the new Rust code: https://x.com/SwivalAgent/status/2054468328119279923
This is an interesting experiment but I’m skeptical of any claims of success by Jarred/Anthropic due to the incentive to hype agents. There’s probably a trillion dollars at stake with the IPO. And Anthropic seems to be developing this part of their business with Mythos and the super review features.
But I’d like to see the same experiment done on a project without so much relying on the story being success.
2 replies →
that sounds like a starting point and an honest translation. If it was originally unsafe and suddenly becomes safe immediately after the rewrite, it would mean they break existing behaviors
Better to know where memory bugs may happen than them being everywhere. Also, bun team are looking it to reduce it by a large margin. Since it was a line by line port, there is a good space for improvement. By first rust release, a significant number of it should be resolved.
Wouldn't it be better to port more idiomatically? Otherwise, you've done nothing but port all the existing bugs while creating new ones.
That's one problem with LLM's. I had claude write a function in python for me that did a bit of math, because, like most programmers, I don't know math.
The function worked perfectly mathematically speaking, but after a bit of research I realized a human being would never write a piece of code so bad.
I don't remember exactly, but it looked like this:
There are 2 problems with this code.
First, that is the correct way to calculate the LCM that you'll quickly learn if you google it (or if you ask claude). The problem: math.lcm already exists! Any human being writing this would have paused to think "wait, Python has math.gcd, does it have math.lcm as well?" And then they would have just used that.
Second, you don't even need reduce. You can just math.lcm(*denominators). A human being would have realized this when intellisense showed it takes any number of arguments instead of just 2.
Pretty much every time I used an LLM to generate code it generates a rough draft barely held together that needs to be completely rewritten later. With Qt for example it generated 2 push buttons for Ok/Cancel when there is QDialogButtonBox for this that even orders the buttons to match the typical system order, or when generating a combo box that associated labels with objects it tried to figure out which object from the text of the label of the items when there is already a way to just set an arbitrary object for each item and then get it later with .currentData().
Every single time it makes me think: yes, this works. But no, not like this.
I can't imagine with 1 million lines of this feels like.
Sure hope Mythos is as world beating as they claim, they’re gonna need it now.
We got memory safety at home !
At home:
> 10428