Comment by jcattle
10 hours ago
I would kind of disagree.
We are talking here in the context of scientific datasets. Of course ETL plays a part here. However here it is really more the interplay of Excel with CSV which is often outputted by scientific instruments or scientific assistants.
You get your raw sensor data as a csv, just want to take a look in excel, it understandably mangles the data in attempt to infer column types, because of course it does, its's CSV! Then you mistakenly hit save and boom, all your data on disk is now an unrecoverable mangled mess.
Of course this is also the fault of not having good clean data practices, but with CSV and Excel it is just so, so easy to hold it wrong, simply because there is no right.
> so you need a heavy unreadable format
I prefer human unreadable if it means I get machine readable without any guesswork.
That's Excel's type inference causing problems. Not an issue with CSV or any other type of DSV.
It is possible to import a CSV into Excel without type conversion. I just tested it two different ways.
While possible, it's not Excel's default way of doing things. Not always obvious or easy. Not enough people who use Excel really know how to use it.
Regardless, Excel mangling files via type inference is an Excel problem. It's not the fault of the file formats Excel reads in.
The file format being ambiguous and underspecified enough to mangle is, though.