← Back to context

Comment by Ygg2

5 days ago

Theoretically yes. Practically there is character escaping.

That kills any non-allocation dreams. Moment you have "Hi \uxxxx isn't the UTF nice?" you will probably have to allocate. If source is read-only you have to allocate. If source is mutable you have to waste CPU to rewrite the string.

I'm confused why this would be a problem. UTF-8 and UTF-16 (the only two common unicode subsets) are a maximum of 4 bytes wide (and, most commonly, 2 in English text). The ASCII representation you gave is 6-bytes wide. I don't know of many ASCII unicode representations that have less bytewidth than their native Unicode representation.

Same goes for other characters such as \n, \0, \t, \r, etc. All half in native byte representation.

> Moment you have "Hi \uxxxx isn't the UTF nice?" you will probably have to allocate.

Depends on what you are doing with it. If you aren't displaying it (and typically you are not in a server application), you don't need to unescape it.

  • And this is indeed something that the C++ Glaze library supports, to allow for parsing into a string_view pointing into the original input buffer.

It’s just two pointers the current place to write and the current place to read, escapes are always more characters than they represent so there’s no danger of overwriting the read pointer. If you support compression this can become somewhat of and issue but you simply support a max block size which is usually defined by the compression algorithm anyway.

  • If you have a place to write, then it's not zero allocation. You did an allocation.

    And usually if you want maximum performance, buffered read is the way to go, which means you need a write slab allocation.

    • > If you have a place to write, then it's not zero allocation. You did an allocation.

      Where did that allocation happen? You can write into the buffer you're reading from, because the replacement data is shorter than the original data.

      5 replies →

> Practically there is character escaping

The voice of experience appears. Upvoted.

It is conceivable to deal with escaping in-place, and thus remain zero-alloc. It's hideous to think about, but I'll bet someone has done it. Dreams are powerful things.