Comment by OutOfHere

4 days ago

Don't use it:

> By providing the beta version of FireDucks free of charge and enabling data scientists to actually use it, NEC will work to improve its functionality while verifying its effectiveness, with the aim of commercializing it within FY2024.

In other words, it's free only to trap you.

Thanks for the warning.

I nearly made the mistake of merging Akka into a codebase recently; fortunately I double-checked the license and noticed it was the bullshit BUSL and it would have potentially cost my employer tens of thousands of dollars a year [1]. I ended up switching everything to Vert.x, but I really hate how normalized these ostensibly open source projects are sneaking scary expensive licenses into things now.

[1] Yes I'm aware of Pekko now, and my stuff probably would have worked with Pekko, but I didn't really want to deal with something that by design is 3 years out of date.

  • IMO, you made a good decision ditching akka. We have an akka app before the BUSL and it is a PITA to maintain.

    Vert.x and other frameworks are far better and easier for most devs to grok.

    • > We have an akka app before the BUSL and it is a PITA to maintain

      I would imaging the non-Scala use case to be less than ideal.

      In Scala land, Pekko - the open source fork of Akka is the way to go if you need compatibility. Personally, I'd avoid new versions of Akka like the plague, and just use more modern alternatives to Pekko/Akka anyway.

      I'm not sure what Lightbend's target market is? Maybe they think they have enough critical mass to merit the price tag for companies like Sony/Netflix/Lyft, etc. But they've burnt their bridge right into the water with everyone else, so I see them fading into irrelevance over the next few years.

      1 reply →

    • I've found actors (Akka specifically) to be a great model when you have concurrent access to fine grained shared state. It provides such a simple mental model of how to serialize that access. I'm not a fan as a general programming model or even as a general purpose concurrent programming model.

      2 replies →

    • Yeah, Vert.x actually ended up being pretty great. I feel like it gives me most of the cool features of Akka that I actually care about, but it allows you to gradually move into it; it can be a full-on framework, but it can also just be a decent library to handle concurrency.

      Plus the license isn't stupid.

  • >it was the bullshit BUSL

    I didn't know the licence and had a look, but I don't see what is bullshit with it. It's not a classical open source licence, but pretty close and much better than closed source

    > and it would have potentially cost my employer tens of thousands of dollars a year

    If your employer is not providing its software open source, there is nothing shocking to have to pay for the software used

    • > I didn't know the licence and had a look, but I don't see what is bullshit with it.

      I just think it's a proprietary license that is trying to LARP as an OSS license. It sneaks in language that makes it so it's unclear how much it will actually cost you to use it. It makes me terrified to import anything touching it because I don't want to risk accidentally costing my employer millions of dollars.

      I don't really see how it's "pretty close" to an OSS license. Part of an OSS license is that I can use the code for whatever I want, which is decidedly not the case with BUSL. I do appreciate that stuff eventually becomes Apache, so I guess that's better than nothing, but I'd rather just avoid the stuff entirely, or only use the Apache licensed stuff.

      I also don't really like the idea that I could contribute to Akka, have my contributions being monetized by Lightbend, but I'm not even allowed to use my own contributions without paying them a fee. I know that CLAs aren't exactly new in the OSS world, but at least if I were to make a contribution to Ubuntu, I'm still allowed to run Ubuntu server for free, with my contributions included.

      I guess the license just kind of feels "Bait and Switch" to me. It tries to get you to think that it's OSS and then smacks you with a "JK IT'S PROPRIETARY".

      > If your employer is not providing its software open source, there is nothing shocking to have to pay for the software used

      Sure, except in the case of Akka there's enough competition in the Java library world that I don't think that it's worth it. Vert.x is comparable, and the license is less likely to accidentally cost me lots of money.

      I mostly think that Akka's licensing is way too expensive too, again especially when you consider that there's a good chunk of concurrency libraries in Java-land that have more business-friendly licenses.

      4 replies →

Important to upvote this. If there's room for improvement for Polars (which I'm sure there is), go and support the project. But don't fall for a commercial trap when there are competent open source tools available.

  • No shade to the juggernaut of the open source software movement and everything it has/will enabled, but why the hate for a project that required people’s time and knowledge to create something useful to a segment of users and then expect to charge for using it in the future? Commercial trap seems to imply this is some sort of evil machination but it seems like they are being quite upfront with that language.

    • It's not hate for the project, it's hate for the deceptive rollout.

      Basically it's a debate about how many dark patterns can you squeeze next to that "upfront language" before "marketing" slides into "bait-n-switch."

    • Not sure if evil or not, but it is unprofessional to use a tool that you dont know how much it will cost for your company in the future.

  • While I agree, it's worth noting that this project is a drop-in replacement (they claim that, at least), but Polars has a very different API. I much prefer Polars's API, but it's still a non-trivial cost to switch to it, which is why many people would instead explore Pandas alternatives instead.

I thought I saw on the documentation that it was released under the modified BSD license. I guess they could take future versions closed source, but the current version should be available for folks to use and further develop.

I don't trust their benchmarks. I ran their benchmarks source locally on my machine TPCH scale 10. Polars was orders of magnitudes faster and didn't SIGABORT at query 10 (I wasn't OOM).

    (.venv) [fireducks]  ritchie46 /home/ritchie46/Downloads/deleteme/polars-tpch[SIGINT] $ SCALE_FACTOR=10.0 make run-polars
    .venv/bin/python -m queries.polars
    {"scale_factor":10.0,"paths":{"answers":"data/answers","tables":"data/tables","timings":"output/run","timings_filename":"timings.csv","plots":"output/plot"},"plot":{"show":false,"n_queries":7,"y_limit":null},"run":{"io_type":"parquet","log_timings":false,"show_results":false,"check_results":false,"polars_show_plan":false,"polars_eager":false,"polars_streaming":false,"modin_memory":8000000000,"spark_driver_memory":"2g","spark_executor_memory":"1g","spark_log_level":"ERROR","include_io":true},"dataset_base_dir":"data/tables/scale-10.0"}
    Code block 'Run polars query 1' took: 1.47103 s
    Code block 'Run polars query 2' took: 0.09870 s
    Code block 'Run polars query 3' took: 0.53556 s
    Code block 'Run polars query 4' took: 0.38394 s
    Code block 'Run polars query 5' took: 0.69058 s
    Code block 'Run polars query 6' took: 0.25951 s
    Code block 'Run polars query 7' took: 0.79158 s
    Code block 'Run polars query 8' took: 0.82241 s
    Code block 'Run polars query 9' took: 1.67873 s
    Code block 'Run polars query 10' took: 0.74836 s
    Code block 'Run polars query 11' took: 0.18197 s
    Code block 'Run polars query 12' took: 0.63084 s
    Code block 'Run polars query 13' took: 1.26718 s
    Code block 'Run polars query 14' took: 0.94258 s
    Code block 'Run polars query 15' took: 0.97508 s
    Code block 'Run polars query 16' took: 0.25226 s
    Code block 'Run polars query 17' took: 2.21445 s
    Code block 'Run polars query 18' took: 3.67558 s
    Code block 'Run polars query 19' took: 1.77616 s
    Code block 'Run polars query 20' took: 1.96116 s
    Code block 'Run polars query 21' took: 6.76098 s
    Code block 'Run polars query 22' took: 0.32596 s
    Code block 'Overall execution of ALL polars queries' took: 34.74840 s
    (.venv) [fireducks]  ritchie46 /home/ritchie46/Downloads/deleteme/polars-tpch$ SCALE_FACTOR=10.0 make run-fireducks
    .venv/bin/python -m queries.fireducks
    {"scale_factor":10.0,"paths":{"answers":"data/answers","tables":"data/tables","timings":"output/run","timings_filename":"timings.csv","plots":"output/plot"},"plot":{"show":false,"n_queries":7,"y_limit":null},"run":{"io_type":"parquet","log_timings":false,"show_results":false,"check_results":false,"polars_show_plan":false,"polars_eager":false,"polars_streaming":false,"modin_memory":8000000000,"spark_driver_memory":"2g","spark_executor_memory":"1g","spark_log_level":"ERROR","include_io":true},"dataset_base_dir":"data/tables/scale-10.0"}
    Code block 'Run fireducks query 1' took: 5.35801 s
    Code block 'Run fireducks query 2' took: 8.51291 s
    Code block 'Run fireducks query 3' took: 7.04319 s
    Code block 'Run fireducks query 4' took: 19.60374 s
    Code block 'Run fireducks query 5' took: 28.53868 s
    Code block 'Run fireducks query 6' took: 4.86551 s
    Code block 'Run fireducks query 7' took: 28.03717 s
    Code block 'Run fireducks query 8' took: 52.17197 s
    Code block 'Run fireducks query 9' took: 58.59863 s
    terminate called after throwing an instance of 'std::length_error'
      what():  vector::_M_default_append
    Code block 'Overall execution of ALL fireducks queries' took: 249.06256 s
    Traceback (most recent call last):
      File "/home/ritchie46/miniconda3/lib/python3.10/runpy.py", line 196, in _run_module_as_main
        return _run_code(code, main_globals, None,
      File "/home/ritchie46/miniconda3/lib/python3.10/runpy.py", line 86, in _run_code
        exec(code, run_globals)
      File "/home/ritchie46/Downloads/deleteme/polars-tpch/queries/fireducks/__main__.py", line 39, in <module>
        execute_all("fireducks")
      File "/home/ritchie46/Downloads/deleteme/polars-tpch/queries/fireducks/__main__.py", line 22, in execute_all
        run(
      File "/home/ritchie46/miniconda3/lib/python3.10/subprocess.py", line 526, in run
        raise CalledProcessError(retcode, process.args,
    subprocess.CalledProcessError: Command '['/home/ritchie46/Downloads/deleteme/polars-tpch/.venv/bin/python', '-m', 'fireducks.imhook', 'queries/fireducks/q10.py']' died with <Signals.SIGABRT: 6>.
    make: \*\* [Makefile:52: run-fireducks] Error 1
    (.venv) [fireducks]  ritchie46 /home/ritchie46/Downloads/deleteme/polars-tpch[2] $

If it's good, then why not just fork it when (if) the license changes? It is 3-clause BSD.

In fact, what's stopping the pandas library from incorporating fireducks code into the mainline branch? pandas itself is BSD.