Comment by xyzzyz

1 day ago

You can easily treat `&str` as bytes, just call `.as_bytes()`, and you get `&[u8]`, no questions asked. The reason why you don't want to treat &str as just bytes by default is that it's almost always a wrong thing to do. Moreover, it's the worst kind of a wrong thing, because it actually works correctly 99% of the time, so you might not even realize you have a bug until much too late.

If your API takes &str, and tries to do byte-based indexing, it should almost certainly be taking &[u8] instead.

3 comments

xyzzyz

inferiorhuman 1 day ago

  If your API takes &str, and tries to do byte-based indexing, it should
  almost certainly be taking &[u8] instead.

Str is indexed by bytes. That's the issue.

xyzzyz 3 hours ago

As a matter of fact, you cannot do

  let s = “asd”;
  println!(“{}”, s[0]);

You will get a compiler error telling you that you cannot index into &str.

inferiorhuman 2 hours ago

Right, you have to give it a usize range. And that will index by bytes. This:

  fn main() {
      let s = "12345";
      println!("{}", &s[0..1]);
  }

compiles and prints out "1".

This:

  fn main() {
      let s = "\u{1234}2345";
      println!("{}", &s[0..1]);
  }

compiles and panics with the following error:

  byte index 1 is not a char boundary; it is inside 'ሴ' (bytes 0..3) of `ሴ2345`

To get the nth char (scalar codepoint):

  fn main() {
      let s = "\u{1234}2345";
      println!("{}", s.chars().nth(1).unwrap());
  }

To get a substring:

  fn main() {
      let s = "\u{1234}2345";
      println!("{}", s.chars().skip(0).take(1).collect::<String>());
  }

To actually get the bytes you'd have to call #as_bytes which works with scalar and range indices, e.g.:

  fn main() {
      let s = "\u{1234}2345";
      println!("{:02X?}", &s.as_bytes()[0..1]);
      println!("{:02X}", &s.as_bytes()[0]);
  }

IMO it's less intuitive than it should be but still less bad than e.g. Go's two types of nil because it will fail in a visible manner.