Comment by cnt-dracula

1 year ago

Hi, thanks for your work!

I have a question, as someone who can just about read assembly but still do not intuitively understand how to write or decompose ideas to utilise assembly, do you have any suggestions to learn / improve this?

As in, at what point would someone realise this thing can be sped up by using assembly? If one found a function that would be really performant in assembly how do you go about writing it? Would you take the output from a compiler that's been converted to assembly or would you start from scratch? Does it even matter?

You're looking for the tiniest blocks of code that are run an exceptional number of times.

For instance, I used to work on graphics renderers. You'd find the bit that was called the most (writing lines of pixels to the screen) and try to jiggle the order of the instructions to decrease the number of cycles used to move X bits from system RAM to graphics RAM.

When I was doing it, branching (usually checking an exit condition on a loop) was the biggest performance killer. The CPU couldn't queue up instructions past the check because it didn't know whether it was going to go true or false until it got there.

  • Don’t modern or even just not ancient cpus use branch prediction to work past a check knowing that the vast majority of the time the check yields the same result?

    • All the little tricks that the CPU has to speed things up, like branch prediction, out of order execution, parallel branch execution, etc, are mostly more expensive than just not having to rely on them in the first place. Branch prediction in particular is not something that should be relied on too heavily either, since it is actually quite a fragile optimization that can cause relatively large performance swings with seemingly meaningless changes to the code.

    • Branch prediction is great for predictable branches, which is often what you have, or a good approximation to it. I forget the exact criteria, but even quite old chips could learn, e.g., all repeating patterns of length up to 4, most repeating patterns of length up to 8 and fixed-length loop patterns (n YESes followed by 1 NO) of any length.

      Quite often, though, you don't have predictable branches, and then you'll pay half the misprediction cost each time on average. If you're really unlucky, you could hit inputs where the branch predictor gets it wrong more than 50% of the time.

The best answer to your question is some variant of "write more assembly".

When someone indicates to me they want to learn programming for example, I ask them how many programs they've written. The answer is usually zero, and in fact I've never even heard greater than 10. No one will answer a larger number because that selects out people who would even ask the question. If you write 1000 programs that solve real problems, you'll be at least okay. 10k and you'll be pretty damn good. 100k and you might be better than the guy who wrote the assembly manual.

For a fun answer, this is a $20 nand2tetris-esque game that holds your hand through creating multiple cpu architectures from scratch with verification (similarly to prolog/vhdl), plus your own assembly language. I admittedly always end up writing an assembler outside of the game that copies to my clipboard, but I'm pretty fussy about ux and prefer my normal tools.

https://store.steampowered.com/app/1444480/Turing_Complete/

This is one heck of a question.

I don't know assembly, but my advice would be to take the rote route by rewriting stuff in assembly.

Just like anything else, there's no quick path to the finish line (unless you're exceptionally gifted), so putting in time is always the best action to take.