Comment by LightMachine

8 months ago

Bend has no tail-call optimization yet. It is allocating a 1-billion long stack, while C is just looping. If you compare against a C program that does actual allocations, Bend will most likely be faster with a few threads.

Bend's codegen is still abysmal, but these are all low-hanging fruits. Most of the work went into making the parallel evaluator correct (which is extremely hard!). I know that sounds "trust me", but the single-thread performance will get much better once we start compiling procedures, generating loops, etc. It just hasn't been done.

(I wonder if I should have waited a little bit more before actually posting it)

> (I wonder if I should have waited a little bit more before actually posting it)

No. You built something that’s pretty cool. It’s not done yet, but you’ve accomplished a lot! I’m glad you posted it. Thank you. Ignore the noise and keep cooking!

>> Bend has no tail-call optimization yet.

I've never understood the fascination with tail calls and recursion among computer science folks. Just write a loop, it's what it optimises to anyway.

  • Computer science traditionally focuses on algorithms. Divide-and-conquer algorithms are frequently much more clearly expressed as recursion. Quicksort is the classic example, also various tree operations.

  • I've never understood the fascination with programming languages among computer science folks. Just write machine code directly, it's what it compiles to anyway.

If they’re low-hanging fruit, why not do that before posting about it publicly? All that happens is that you push yourself into a nasty situation: people get a poor first impression of the system and are less likely to trust you the second time around, and in the (possibly unlikely) event that the problems turn out to be harder than you expect, you wind up in the really nasty situation of having to deal with failed expectations and pressure to fix them quickly.

  • I agree with you. But then there's the entire "release fast, don't wait before it is perfect". And, then, there's the case that people using it will guide us to iteratively building what is needed. I'm still trying to find that balance, it isn't so easy. This release comes right after we finally managed to compile it to GPUs, which is a huge milestone people could care about - but there are almost no micro-optimizations.