← Back to context

Comment by dumael

5 years ago

> This is a great overview. I don't remember having to put in padding instructions to prevent the pipeline issues mentioned here; maybe we just never ran into that. (I wrote pretty much all the R3000 code for Crash 1 and just do not recall problems like that coming up.)

If you were using the GNU assembler, it automatically fills branch delay slots with nop instructions unless you prefix assembly code as using `.set noreorder`. GAS would also handle load delay slots as well.

Inserting NOPs is a waste of code space and execution resources though. If resources are not too tight, this is fine.

We used gcc on a MIPS M4K in a communication chip. We had a lot of existing C code and were short on ROM and on CPU cycles. Therefore a few co-workers wrote a tool which parsed the gcc asm output to fill the branch delay slot with an instruction with no side effects on the branch. It also fixed some gcc issue with 16 bit memory accesses in C were created as 32 bit load instruction in asm (which can be two cycles if the first 16 bit of a 32 bit word are needed). I had a HW/SW cosim setup to test code and hardware (Verilog). These were cool projects. Good memories (although quite vague now).

PS: If we would have had a license of the Green Hills compiler we could have saved some of the effort. IIRC it did branch delay slot optimization.