Comment by jerf
10 years ago
The first "smiley face" on ALT-1 is actually the ASCII character SOH "start of heading"; many things that might otherwise accept Unicode will properly filter that out because ASCII control codes are illegal in a wide variety of otherwise-accepting contexts.
But it is a great QA check on any text field, which should either cleanly reject it in some manner [1] XOR accept it and process it "correctly" for whatever that means locally, but not something in between.
[1]: A lot of Unicode processing nowadays puts in the Unicode replacement character for unknown characters, but for the ASCII control codes I'd say you've often got a solid security case to say "Someone's just trying to screw with the system, we'll just filter it out entirely" for them. Excepting the ones we still use, basically \r \n \t, there's not much reason to keep them. (Think twice about \v "vertical tab" and think three times about letting \b "backspace"s through. Inconsistent behaviors by various layers of code are scary.)
It makes sense that control characters are removed (or replaced). I didn't know the ALT-1 was SOH, so that's good to know. (I guess I'm showing my age a bit there)
Thanks for the info!