Comment by ritchie46

11 hours ago

I don't trust their benchmarks. I ran their benchmarks source locally on my machine TPCH scale 10. Polars was orders of magnitudes faster and didn't SIGABORT at query 10 (I wasn't OOM).

    (.venv) [fireducks]  ritchie46 /home/ritchie46/Downloads/deleteme/polars-tpch[SIGINT] $ SCALE_FACTOR=10.0 make run-polars
    .venv/bin/python -m queries.polars
    {"scale_factor":10.0,"paths":{"answers":"data/answers","tables":"data/tables","timings":"output/run","timings_filename":"timings.csv","plots":"output/plot"},"plot":{"show":false,"n_queries":7,"y_limit":null},"run":{"io_type":"parquet","log_timings":false,"show_results":false,"check_results":false,"polars_show_plan":false,"polars_eager":false,"polars_streaming":false,"modin_memory":8000000000,"spark_driver_memory":"2g","spark_executor_memory":"1g","spark_log_level":"ERROR","include_io":true},"dataset_base_dir":"data/tables/scale-10.0"}
    Code block 'Run polars query 1' took: 1.47103 s
    Code block 'Run polars query 2' took: 0.09870 s
    Code block 'Run polars query 3' took: 0.53556 s
    Code block 'Run polars query 4' took: 0.38394 s
    Code block 'Run polars query 5' took: 0.69058 s
    Code block 'Run polars query 6' took: 0.25951 s
    Code block 'Run polars query 7' took: 0.79158 s
    Code block 'Run polars query 8' took: 0.82241 s
    Code block 'Run polars query 9' took: 1.67873 s
    Code block 'Run polars query 10' took: 0.74836 s
    Code block 'Run polars query 11' took: 0.18197 s
    Code block 'Run polars query 12' took: 0.63084 s
    Code block 'Run polars query 13' took: 1.26718 s
    Code block 'Run polars query 14' took: 0.94258 s
    Code block 'Run polars query 15' took: 0.97508 s
    Code block 'Run polars query 16' took: 0.25226 s
    Code block 'Run polars query 17' took: 2.21445 s
    Code block 'Run polars query 18' took: 3.67558 s
    Code block 'Run polars query 19' took: 1.77616 s
    Code block 'Run polars query 20' took: 1.96116 s
    Code block 'Run polars query 21' took: 6.76098 s
    Code block 'Run polars query 22' took: 0.32596 s
    Code block 'Overall execution of ALL polars queries' took: 34.74840 s
    (.venv) [fireducks]  ritchie46 /home/ritchie46/Downloads/deleteme/polars-tpch$ SCALE_FACTOR=10.0 make run-fireducks
    .venv/bin/python -m queries.fireducks
    {"scale_factor":10.0,"paths":{"answers":"data/answers","tables":"data/tables","timings":"output/run","timings_filename":"timings.csv","plots":"output/plot"},"plot":{"show":false,"n_queries":7,"y_limit":null},"run":{"io_type":"parquet","log_timings":false,"show_results":false,"check_results":false,"polars_show_plan":false,"polars_eager":false,"polars_streaming":false,"modin_memory":8000000000,"spark_driver_memory":"2g","spark_executor_memory":"1g","spark_log_level":"ERROR","include_io":true},"dataset_base_dir":"data/tables/scale-10.0"}
    Code block 'Run fireducks query 1' took: 5.35801 s
    Code block 'Run fireducks query 2' took: 8.51291 s
    Code block 'Run fireducks query 3' took: 7.04319 s
    Code block 'Run fireducks query 4' took: 19.60374 s
    Code block 'Run fireducks query 5' took: 28.53868 s
    Code block 'Run fireducks query 6' took: 4.86551 s
    Code block 'Run fireducks query 7' took: 28.03717 s
    Code block 'Run fireducks query 8' took: 52.17197 s
    Code block 'Run fireducks query 9' took: 58.59863 s
    terminate called after throwing an instance of 'std::length_error'
      what():  vector::_M_default_append
    Code block 'Overall execution of ALL fireducks queries' took: 249.06256 s
    Traceback (most recent call last):
      File "/home/ritchie46/miniconda3/lib/python3.10/runpy.py", line 196, in _run_module_as_main
        return _run_code(code, main_globals, None,
      File "/home/ritchie46/miniconda3/lib/python3.10/runpy.py", line 86, in _run_code
        exec(code, run_globals)
      File "/home/ritchie46/Downloads/deleteme/polars-tpch/queries/fireducks/__main__.py", line 39, in <module>
        execute_all("fireducks")
      File "/home/ritchie46/Downloads/deleteme/polars-tpch/queries/fireducks/__main__.py", line 22, in execute_all
        run(
      File "/home/ritchie46/miniconda3/lib/python3.10/subprocess.py", line 526, in run
        raise CalledProcessError(retcode, process.args,
    subprocess.CalledProcessError: Command '['/home/ritchie46/Downloads/deleteme/polars-tpch/.venv/bin/python', '-m', 'fireducks.imhook', 'queries/fireducks/q10.py']' died with <Signals.SIGABRT: 6>.
    make: \*\* [Makefile:52: run-fireducks] Error 1
    (.venv) [fireducks]  ritchie46 /home/ritchie46/Downloads/deleteme/polars-tpch[2] $