Comment by inferiorhuman

6 months ago

  If your API takes &str, and tries to do byte-based indexing, it should
  almost certainly be taking &[u8] instead.

Str is indexed by bytes. That's the issue.

4 comments

inferiorhuman

xyzzyz 6 months ago

As a matter of fact, you cannot do

  let s = “asd”;
  println!(“{}”, s[0]);

You will get a compiler error telling you that you cannot index into &str.

inferiorhuman 6 months ago
Right, you have to give it a usize range. And that will index by bytes. This:
fn main() { let s = "12345"; println!("{}", &s[0..1]); }
compiles and prints out "1".
This:
fn main() { let s = "\u{1234}2345"; println!("{}", &s[0..1]); }
compiles and panics with the following error:
byte index 1 is not a char boundary; it is inside 'ሴ' (bytes 0..3) of `ሴ2345`
To get the nth char (scalar codepoint):
fn main() { let s = "\u{1234}2345"; println!("{}", s.chars().nth(1).unwrap()); }
To get a substring:
fn main() { let s = "\u{1234}2345"; println!("{}", s.chars().skip(0).take(1).collect::<String>()); }
To actually get the bytes you'd have to call #as_bytes which works with scalar and range indices, e.g.:
fn main() { let s = "\u{1234}2345"; println!("{:02X?}", &s.as_bytes()[0..1]); println!("{:02X}", &s.as_bytes()[0]); }
IMO it's less intuitive than it should be but still less bad than e.g. Go's two types of nil because it will fail in a visible manner.
- xyzzyz 6 months ago
  
  It's actually somewhat hard to hit that panic in a realistic scenario. This is because you are unlikely to be using slice indices that are not on a character boundary. Where would you even get them from? All the standard library functions will return byte indices on a character boundary. For example, if you try to do something like slice the string between first occurrence of character 'a', and of character 'z', you'll do something like
  let start = s.find('a')?; let end = s.find('z')?; let sub = &s[start..end];
  and it will never panic, because find will never return something that's not on a char boundary.
  
  1 reply →