← Back to context

Comment by josephg

1 day ago

> to get Excel to correctly load a UTF8 encoded CSV or similar you must include the BOM

Ah so that’s the trick! I’ve run into this problem a bunch of times in the wild, where some script emits csv which works on the developers machine but fails strangely with real world data.

Good to know there’s a simple solution. I hope I remember your comment next time I see this!

Excel CSV is broken anyway, since in some (EU, ...) countries it needs ; as separator.

  • That's not an excel issue. That's a locale issue.

    Due to (parts of?) the EU using then comma as the decimal separator, you have to use another symbol to separate your values.

    • Comma for decimal separator, and point (or sometimes 'postraphy) for thousands separator if there is one, is very common. IIRC more European countries use that than don't, officially, and a bunch of countries outside Europe do too.

      It wouldn't normally necessitate not using comma as the field separator in CSV files though, wrapping those values is quotes is how that would usually be handled in my experience.

      Though many people end up switching to “our way”, despite their normal locale preferences, because of compatibility issues they encounter otherwise with US/UK software written naively.

    • Locales should have died long ago. You use plain data, stop parsing it depdending on wen your live. Plan9/9front uses where right long ago. Just use Unicode everywhere, use context-free units for money.

      1 reply →

  • A lot of the time when people say CSV they mean “character separated values” rather than specifically “comma separated values”.

    In the text files we get from clients we sometimes see tab used instead of comma, or pipe. I don't think we've seen semicolon yet, though our standard file interpreter would quietly cope¹ as long as there is nothing really odd in the header row.

    --------

    [1] it uses the heuristic “the most common non-alpha-numeric non-space non-quote character found in the header row” to detect the separator used if it isn't explicitly told what to expect