Comment by cryptonector

3 months ago

Variable width encodings like UTF-8 and UTF-16 cannot be indexed in O(1), only in O(N). But this is not really a problem! Instead of indexing strings we need to slice them, and generally we read them forwards, so if slices (and slices of slices) are cheap, then you can parse textual data without a problem. Basically just keep the indices small and there's no problem.

3 comments

cryptonector

account42 3 months ago

Unicode itself is variable with due to combining characters, variant selectors, etc.

cryptonector 3 months ago

Yes, quite.

bjoli 3 months ago

Or just use immutsble strings and look-up-tales. Say, every 32 characters, combined with cursors. This is going to make indexing fast enough for randomly jumping into a striong and the using cursors.