← Back to context

Comment by Aurornis

21 hours ago

unsafe just means that you take responsibility for the safety of the code contained within. Calling into non-Rust libraries has to be wrapped in unsafe. Making syscalls has to be wrapped in unsafe.

Bun needs to interact with FFI code. This gets wrapped in unsafe blocks.

There are many places where a JavaScript interpreter and library would need to make unsafe calls and operations.

It doesn't literally mean the code is unsafe. It means the code contained within is not something that can be checked by the compiler, so the writer takes responsibility for it.

There are many low-level data munging and other benign operations that a human can demonstrate are safe, but need to be wrapped in safe because they do things outside of what the compiler can check.

There's actually a good example of this in the rewrite [1], in `PathString::slice`. They are doing an unsafe operation to return a slice that could be a use-after-free, if the caller had not already guaranteed that an invariant will remain true. Following proper rust idiomatic practices, claude has added a SAFETY comment to the unsafe block to explain why it's safe: "caller guarantees the borrowed memory outlives this".

Now, normally, you'd communicate this contract to your API users by marking the type's constructor (PathString::init) as "unsafe", and including the contract in its documentation. Unfortunately in this case, this invariant does not exist - it appears to have been fabricated out of thin air by the LLM [2]. So, not only does this particular codebase have UB problems caused by unsafe code, the SAFETY blocks for the unsafe code are also, well, lies.

[1] https://github.com/oven-sh/bun/blob/63035b3e37/src/bun_core/...

[2] https://github.com/oven-sh/bun/blob/63035b3e37/src/bun_core/...

  • `PathString` worked the exact same way in our Zig code, with less visibility from the compiler & type system. And yes, it will be refactored heavily (or deleted overall) in the next week or so.

  • One potential way to solve this in a principled manner is to turn at least some "unsafe" annotations into ghost capability tokens that are explicitly threaded through the code and consistently checked by the compiler. Manufacturing the capability could itself be left as an unsafe operation, or require a runtime check of some kind.

    You already see this in some cases, for example the NonZero<T> generic type can be viewed as a T endowed with a capability or token that just says "this particular value of type T is nonzero, so the zero value is available for niche purposes". But this could be expanded a lot, especially with some AI assistance.

    • This already happens all the time in rust, including in the standard library. The typical pattern is to define your CheckedType to be

      pub struct CheckedType(UncheckedType);

      e.g. where its inner field is private. Then, you only present safe constructors that check your invariant, and only provide methods that maintain the invariant.

      For a concrete example, String in rust is a Vec<u8> with the guarantee that the underlying bytes correspond to valid UTF8. Concretely, it is defined as

      #[derive(PartialEq, PartialOrd, Eq, Ord)] #[stable(feature = "rust1", since = "1.0.0")] #[lang = "String"] pub struct String { vec: Vec<u8>, }

      You can construct a string from a vec of bytes via

      fn from_utf8(vec: Vec<u8>) -> Result<String, _>;

      as well as the unsafe method

      unsafe fun from_utf8_unchecked(vec: Vec<u8>) -> String;

      Note here that there isn't a separate capability/token though. This is typically viewed as bad practice in rust, as you can always ignore checking a capability/token. See for example rust's mutexes Mutex<T>, which carry the data (T) that you want access to themself. So, to get access to the data, you must call .lock(). There is a similar philosophy behind Rust's `Result` type. to get data underlying it, you must handle the possibility of an error somehow (which can include panicing upon detecting the error of course).

> unsafe just means that you take responsibility for the safety of the code contained within.

In this case it means you delegated the responsibility to a notably flaky heuristic.