Comment by networked

9 years ago

What if the user has a string ending with a meaningful zero width space already stored? For example, the string could be checksummed somewhere. It would corrupt their data.

If you want a kludge for this, it's better to generate a longish random string (e.g., a UUID) to indicate an empty value.

When you get a string from the client you prepend a single zero width space. When you send a string back you strip the single leading space you added. The client will always have the exact same data back that they sent originally.

  • You're right, of course. Sorry, I wasn't clear. I meant that a user might have stored a string with a zero width space at the end by the time you introduce this escaping mechanism. (I've already edited the comment to indicate this.) The same goes double if you append a common printable character. You'd have to rely on some additional indicator, such as the date and time the record with the string was stored, to know whether to unescape a string and also be sure nothing changed those date and time without escaping the data.

    • Oh, yeah, in that case I'd either if-case it by time stamp or I'd prepend a zero width space to all historical data as well. I would prefer doing the latter and would only do the former if there was some reason I couldn't do the latter, for example if I had too much historical data to able to process it (though I have a hard time imagining that happening for something so trivial as prepending a zero width space, unlike say converting thousands of hours of video which might actually be too time-consuming or computationally expensive).

      One issue that might arise with altering historical data that I can imagine would be if it was ever necessary to restore from backup and your backup was made before you later added the zero width space, and then you forget to add the zero width space again when you restore from backup a few months down the road. But with proper documentation and procedures that shouldn't happen.

I believe the parent is saying

    string_to_store = userstring + extra space
    dynamo.store(key, string_to_store)

    ...

    stored_string = dynamo.retrieve(key)
    user_string = stored_string - extra space

That way the user puts a string in and gets the same string out. No problem.