Comment by LightMachine

8 months ago

The only claim I made is that it scales linearly with cores. Nothing else!

I'm personally putting a LOT of effort to make our claims as accurate and truthful as possible, in every single place. Documentation, website, demos. I spent hours in meetings to make sure everything is correct. Yet, sometimes it feels that no matter how much effort I put, people will just find ways to misinterpret it.

We published the real benchmarks, checked and double checked. And then you complained some benchmarks are not so good. Which we acknowledged, and provided causes, and how we plan to address them. And then you said the benchmarks need more evaluation? How does that make sense in the context of them being underwhelming?

We're not going to compare to Mojo or other languages, specifically because it generates hate.

Our only claim is:

HVM2 is the first version of our Interaction Combinator evaluator that runs with linear speedup on GPUs. Running closures on GPUs required colossal amount of correctness work, and we're reporting this milestone. Moreover, we finally managed to compile a Python-like language to it. That is all that is being claimed, and nothing else. The codegen is still abysmal and single-core performance is bad - that's our next focus. If anything else was claimed, it wasn't us!

> I spent hours in meetings to make sure everything is correct. Yet, sometimes it feels that no matter how much effort I put, people will just find ways to misinterpret it.

from reply below:

> I apologize if I got defensive, it is just that I put so much effort on being truthful, double-checking, putting disclaimers everywhere about every possible misinterpretation.

I just want to say: don't stop. There will always be some people who don't notice or acknowledge the effort to be precise and truthful. But others will. For me, this attitude elevates the project to something I will be watching.

That's true, you never mentioned Python or alternatives in your README, I guess I got Mandela'ed from the comments in Hacker News, so my bad on that.

People are naturally going to compare the timings and function you cite to what's available to the community right now, though, that's the only way we can picture its performance in real-life tasks.

> Mojo or other languages, specifically because it generates hate

Mojo launched comparing itself to Python and didn't generate much hate, it seems, but I digress

In any case, I hope Bend and HVM can continue to improve even further, it's always nice to see projects like those, specially from another Brazilian

  • Thanks, and I apologize if I got defensive, it is just that I put so much effort on being truthful, double-checking, putting disclaimers everywhere about every possible misinterpretation. Hell this is behind install instructions:

    > our code gen is still on its infancy, and is nowhere as mature as SOTA compilers like GCC and GHC

    Yet people still misinterpret. It is frustrating because I don't know what I could've done better

    • Don't worry about it. Keep at it, this is a very cool project.

      FWIW on HN people are inherently going to try to actually use your project and so if it's meant to be (long term) a faster way to run X people evaluate it against that implicit benchmark.

    • Don't optimize for minimum hate, optimize for actionable feedback and ignore the haters. Easier said than done, though.

      Remember you don't need comment trolls on your team, and you'll go insane taking them seriously. Focus on piquing the interest of motivated language nerds. I personally would have really appreciated a "look, were still 10x (or whatever) slower than Python, so now I need all the help I can get working on the codegen, etc." This would have given me quick perspective on why this milestone is meaningful.

      3 replies →

    • I think the (hidden) reasoning is that it is really easy to have speedups with slow interpreters. However, getting speedups in high-performance level programs is quite hard, mainly due to micro-optimisations.

      That's where the comparison to Python comes from: getting speedup on slow interpreters is not very _relevant_. Now if your interpreter has the same optimisations as Python (or v8 or JVM), even a small fraction of what you show would be impressive.

      Having said this, the work your team did is a really challenging engineering feat (and with lot more potential). But I do not believe the current speedups will hold if the interpreter/compilers have the level of optimisation that exist in other languages. And while you do not claim it, people expect that.

    • Perhaps consider moving the warning in the NOTE at the bottom of the README.md to a DISCLAIMER section near the top.

      I read the whole thing first, then commented, but people often read half of such a document, assume they've got all the important bits, and dive straight in.

      (we used to have that problem at $work with new team members and our onboarding doc; I added a section at the bottom that was pure silliness, and then started asking people who claimed to have read it a question that would only make sense if they'd seen the joke ... generally followed by telling them to go back and finish reading and not to try that with me again ;)

> I'm personally putting a LOT of effort to make our claims as accurate and truthful as possible, in every single place

Thank you. I understand in such an early irritation of a language there are going to be lots of bugs.

This seems like a very, very cool project and I really hope it or something like it is successful at making utilizing the GPU less cumbersome.

Perhaps you can add: "The codegen is still abysmal and single-core performance is bad - that's our next focus." as a disclaimer on the main page/videos/etc. This provides more context about what you claim and also very important what you don't (yet) claim.

  • The README has:

    > It is very important to reinforce that, while Bend does what it was built to (i.e., scale in performance with cores, up to 10000+ concurrent threads), its single-core performance is still extremely sub-par. This is the first version of the system, and we haven't put much effort into a proper compiler yet. You can expect the raw performance to substantially improve on every release, as we work towards a proper codegen (including a constellation of missing optimizations).

    which seems to be pretty much exactly that?

    It's at the bottom, though, so I can imagine people just skimming for "how do I get started" missing it, and making it more obvious would almost certainly be a Good Thing.

    I still feel like reading the whole (not particularly long) README before commenting being angry about it (not you) is something one could reasonably think the HN commentariat would be capable of (if you want to comment -without- reading the fine article, there's slashdot for that ;), but I'm also the sort of person who reads a whole man page when encountering a new command so perhaps I'm typical minding there.

> I'm personally putting a LOT of effort to make our claims as accurate and truthful as possible, in every single place.

I'm not informed enough to comment on the performance but I really like this attitude of not overselling your product but still claiming that you reached a milestone. That's a fine balance to strike and some people will misunderstand because we just do not assume that much nuance – and especially not truth – from marketing statements.

Identifying what's parallelizable is valuable in the world of language theory, but pure functional languages are as trivial as it gets, so that research isn't exactly ground-breaking.

And you're just not fast enough for anyone doing HPC, where the problem is not identifying what can be parallelized, but figuring out to make the most of the hardware, i.e. the codegen.

  • This approach is valuable because it abstracts away certain complexities for the user, allowing them to focus on the code itself. I found it especially beneficial for users who are not willing to learn functional languages or parallelize code in imperative languages. HPC specialists might not be the current target audience, and code generation can always be improved over time, and I trust based on the dev comments that it will be.

Naive question: do you expect the linear scaling to hold with those optimisations to single core performance, or would performance diverge from linear there pending further research advancements?

I think you were being absolutely precise, but I want to give a tiny bit of constructive feedback anyway:

In my experience, to not be misunderstood it is more important to understand the state of mind/frame of reference of your audience, than to be utterly precise.

The problem is, if you have been working on something for a while, it is extremely hard to understand how the world looks to someone who has never seen it (1).

The second problem is that when you hit a site like hacker News your audience is impossibly diverse, and there isn't any one state of mind.

When I present research, it always takes many iterations of reflecting on both points to get to a good place.

(1) https://xkcd.com/2501/

The only claim I made is that it scales linearly with cores. Nothing else!

The other link on the front page says:

"Welcome to the Parallel Future of Computation"

  • Scaling with cores is synonym of parallel.

    • Do you think calling your project parallel is what people have an issue with or do you think it's that you're calling your project the future of parallel computation when it doesn't perform anywhere close to what already exists?

I think the issue is that there is the implicit claim that this is faster than some alternative. Otherwise what's the point?

If you add some disclaimer like "Note: Bend is currently focused on correctness and scaling. On an absolute scale it may still be slower than single threaded Python. We plan to improve the absolute performance soon." then you won't see these comments.

Also this defensive tone does not come off well:

> We published the real benchmarks, checked and double checked. And then you complained some benchmarks are not so good. Which we acknowledged, and provided causes, and how we plan to address them. And then you said the benchmarks need more evaluation? How does that make sense in the context of them being underwhelming?

  • Right below install instructions, on Bend's README.md:

    > But keep in mind our code gen is still on its infancy, and is nowhere as mature as SOTA compilers like GCC and GHC.

    Second paragraph of Bend's GUIDE.md:

    > While cool, Bend is far from perfect. In absolute terms it is still not so fast. Compared to SOTA compilers like GCC or GHC, our code gen is still embarrassingly bad, and there is a lot to improve. That said, it does what it promises: scaling horizontally with cores.

    Limitations session on HVM2's paper:

    > While HVM2 achieves near-linear speedup, its compiler is still extremely immature, and not nearly as fast as state-of-art alternatives like GCC of GHC. In single-thread CPU evaluation, HVM2, is still about 5x slower than GHC, and this number can grow to 100x on programs that involve loops and mutable arrays, since HVM2 doesn’t feature these yet.

    • > Right below install instructions

      Yeah exactly. I read most of the readme and watched the demo, but I'm not interested in installing it so I missed this. I would recommend moving this to the first section in its own paragraph.

      I understand you might not want to focus on this but it's important information and not a bad thing at all.

      5 replies →