← Back to context

Comment by nostrademons

6 years ago

Escaping modes. All strings are not equivalent: "Bobby tables" is very different from "'; drop table users; --".

The idea is to encode the contexts where a string is safe to use directly into the type of the variable, and ensure that functions that manipulate them or send them to outside systems can only receive variables of the proper type. When you receive data from the outside world, it's always unsafe: you don't even know if you've gotten a valid utf8 sequence. So all external functions return an UnsafeString, which you can .decode() into a SafeString (or even call it a String for brevity, since most manipulations will be on a safe string). Then when you send to a different system, all strings need to be explicitly encoded: you'd pass a SqlString to the DB that's been escaped to prevent SQL injection, you'd pass a JSONString to any raw JSON fragments that's had quotes etc. escaped, you'd pass an HtmlString to a web template that properly escapes HTML entities, and so on. It's legal to do a "SELECT $fields FROM tablename where $whereClause" if $fields and $whereClause are SqlStrings, but illegal if they are any other type of strings. And if you do <a href="$url"> where $url is an UnsafeString, the templating engine will barf at you.

There are various ways to cut down the syntactic overhead of this system by using sensible defaults for functions. One common one is to receive all I/O as byte[], assume all strings are safe UTF-8 encoded text, and then perform escaping at the library boundaries, using functionality like prepared statements in SQL or autoescaping in HTML templating languages. Most libraries provide an escape-hatch for special cases like directly building an SQL query out of text, using the typed-string mechanism above.