← Back to context

Comment by mason55

6 years ago

Another good example of this is having separate classes for something like unsafe strings vs. safe strings in a web app. The functions which interact with the outside world accept unsafe strings and emit safe strings to the rest of the application. Then the rest of the application only works with safe strings.

Anything that accepts a safe string can make an assumption that it doesn't need to do any validation (or "parsing" in the context of the OP), which lets you centralize validation logic. And since you can't turn an unsafe string into a safe string without sending it through the validator, it prevents unsafe strings from leaking into the rest of the app by accident.

This concept can be used for pretty much anything where you are doing data validation or transformation.

Also a good way to prevent hashed passwords from being accidentally logged.

    Class PasswordType(django.db.models.Field):
        hashed_pw = CharField()
    
        def __str__():
            # you can even raise an Exception here
            return '<confidential data>'

Not that you should be trying to log this stuff anyways, but unless you're a solo dev you can't prevent other people from creating bugs, but you can mitigate common scenarios.

What are safe and unsafe strings supposed to mean? All strings seem like normal string to me, a "DELETE * FROM db" is no different from any other string until it's given to a SQL query.

  • Escaping modes. All strings are not equivalent: "Bobby tables" is very different from "'; drop table users; --".

    The idea is to encode the contexts where a string is safe to use directly into the type of the variable, and ensure that functions that manipulate them or send them to outside systems can only receive variables of the proper type. When you receive data from the outside world, it's always unsafe: you don't even know if you've gotten a valid utf8 sequence. So all external functions return an UnsafeString, which you can .decode() into a SafeString (or even call it a String for brevity, since most manipulations will be on a safe string). Then when you send to a different system, all strings need to be explicitly encoded: you'd pass a SqlString to the DB that's been escaped to prevent SQL injection, you'd pass a JSONString to any raw JSON fragments that's had quotes etc. escaped, you'd pass an HtmlString to a web template that properly escapes HTML entities, and so on. It's legal to do a "SELECT $fields FROM tablename where $whereClause" if $fields and $whereClause are SqlStrings, but illegal if they are any other type of strings. And if you do <a href="$url"> where $url is an UnsafeString, the templating engine will barf at you.

    There are various ways to cut down the syntactic overhead of this system by using sensible defaults for functions. One common one is to receive all I/O as byte[], assume all strings are safe UTF-8 encoded text, and then perform escaping at the library boundaries, using functionality like prepared statements in SQL or autoescaping in HTML templating languages. Most libraries provide an escape-hatch for special cases like directly building an SQL query out of text, using the typed-string mechanism above.

  • pretty sure mason55 is referencing (perhaps unknowingly) an example from joel on software. you can read more about it here: https://www.joelonsoftware.com/2005/05/11/making-wrong-code-...

    • That’s the one, thanks for the link! I tried to find it while writing my post but couldn’t for the life of me remember a single thing to even try searching for. I probably last read that article when it was written in 2005.

  • A safe string is something you got from the programmer (or other trusted source), and an unsafe string is something you got from the network/environment/etc.

  • Then your database API should use "safe strings" only, simple as that.

    • “DELETE * from table” is a safe string though for something like file contents or perhaps a comment box on a hacker news site.

      The term “safe string” is effectively meaningless because it entirely depends on how the internals are going to use it.

      1 reply →

  • Are you genuinely curious, or are you being a troll?

    Look at the content of your string, make a decision as to whether you would give it to a SQL engine. If you have not looked, it's presumed unsafe. If you have validated it - parsed it, in the context of this article and this discussion - and decided that you consider it safe, then it is a safe string from that point on.

    This isn't a philosophical debate about what "safe" means to humans, it's a programming discussion that says if you only want to pass "select * from reports" to your database, check that's what the string contains before you pass it anywhere.

    • Not sure if you’ve worked with databases before, but sql injection sanitization belongs at the SQL layer, not the user input validation layer.

      If you’re doing it at user input validation, you’re doing it wrong.

      1 reply →

    • I am really not trying to be a troll. Genuinely don't understand this concept of safe strings.

      How could a software even look at text content and determine safeness? There are cases where string input might be limited to just letters or numbers but often it's not. As soon as punctuation or unicode (non English users) is on the table, text is basically anything and there are no general defense from that.

      Parsing and static types could have restrictions on string length, min or max value for numbers, how many items in an array, but it cannot make text safe generally-speaking by any meaning of safe. It has no awareness of how the content will be used.

      7 replies →

    • A string that is supposed to represent a ”name” in a web app context — safe or not? I am referring to potential SQL injections.

      Surely, no names contain semicolons, but is the business logic-part of your app to determine that names which only contain A-Za-z (or whatever) are safe?

      It is contextually dependent, meaning in practice as close to the actual SQL query as possible. Or call to file system, where dots are unsafe, and so on.

      Static typing helps here as described in the article.