Comment by maxdamantus
1 year ago
In my jq PR I used negative numbers to represent them (the original byte, negated), since they're already just using `int` to represent a decoded code point, and it's somewhat normal to return distinguishable errors as negative numbers in C. I think it would also make sense to represent the UTF-16 errors ("unpaired surrogates") as negative numbers, though I didn't make that change internally (maybe because they're already used elsewhere). I did make it so that they are represented as negatives in `explode` however, so `"\uD800" | explode` emits `[-0xD800]`.
In something other than C, I'd expect they should be distinguished as members of an enumeration or something, eg:
enum DecodeResult {
Ok(char);
ErrUtf8(u8); // 0x80..0xFF
ErrUtf16(u16); // 0xD800..0xDFFF
}
No comments yet
Contribute on Hacker News ↗