Comment by 7bit
1 day ago
The very fact that UTF-8 itself discouraged from using the BOM is just so alien to me. I understand they want it to be the last encoding and therefore not in need of a explicit indicator, but as it currently IS NOT the only encoding that is used, it makes is just so difficult to understand if I'm reading any of the weird ASCII derivatives or actual Unicode.
It's maddening and it's frustrating. The US doesn't have any of these issues, but in Europe, that's a complete mess!
> The very fact that UTF-8 itself discouraged from using the BOM is just so alien to me.
Adding a BOM makes it incompatible with ASCII, which is one of the benefits of using UTF-8.
> The very fact that UTF-8 itself discouraged from using the BOM is just so alien to me.
One of the key advantages of UTF8 is that all ASCII content is effectively UTF-8. Having the BOM present reduces that convenience a bit, and a file starting with the three bytes 0xEF,0xBB,0xBF may be mistaken by some tools for a binary file rather than readable text.
> The US doesn't have any of these issues
I think you mean “the US chooses to completely ignore these issues and gets away with it because they defined the basic standard that is used, ASCII, way-back-when, and didn't foresee it becoming an international thing so didn't think about anyone else” :)
> because they defined the basic standard that is used, ASCII
I thought it was EBCDIC /s
From wikipedia...
That last one is a weaker point but it is true that with CSV a BOM is more likely to do harm, than good.
Indeed, I've been using the BOM in all my text files for maybe decades now, those who wrote the recommendation are clearly from an English country
> are clearly from an English country
One particular English-speaking country… The UK has issues with ASCII too, as our currently symbol (£) is not included. Not nearly as much trouble as non-English languages due to the lack of accents & such that they need, but we are still affected.