Comment by imtringued

1 month ago

Unrelated, but I personally am not satisfied with the performance of Panda's XLSX export. As you can see here [0], the code does really strange things. It takes cell.style and throws it into json.dumps() to generate a key for a dictionary so that they can cache the XlsxStyler.convert(cell.style) result. Except, the vast majority of cells do not have any styling whatsoever, so json.dumps is producing the string "null", which is then used to lookup None. The low hanging fruit are jaw dropping. You can easily speed up the code 10%+ by adding a simple check "if cell.style is not None or fmt is not None:" and switching from json.dumps(cell.style) to str(cell.style). If I wanted an easy weekend project that positively impacts many people this is what I'd work on.

[0] https://github.com/pandas-dev/pandas/blob/main/pandas/io/exc...

1 comment

imtringued

rthz 1 month ago

Have you tried opening an issue about it? Maybe someone would be happy to work on it. I concur that Excel parsing is rather slow.