← Back to context

Comment by ralferoo

3 hours ago

It's fairly easy to get triangle rasterisation performant if you think about the problem hard enough.

Here's an implementation I wrote for the PS3 SPU many moons ago: https://github.com/ralferoo/spugl/blob/master/pixelshaders/t...

That does perspective correct texture mapping, and from a quick count of the instructions in the main loop is approximately 44 cycles per 8 pixels.

The process of solving the half-line equation used also doesn't suffer from any overlapping pixel or gaps, as long as both points are the same and you use fixed point arithmetic.

The key trick is to rework each line equation such that it's effectively x.dx+y.dy+C=0. You can then evaluate A=x.dx+y.dy+C at the top left of the square that encloses the triangle. Every pixel to the right, you can just add dx, and every pixel down, you can just add dy. The sign bit indicates whether the pixel is or isn't inside that side of the triangle, and you can and/or the 3 side's sign bits together to determine whether a pixel is inside or outside the triangle. (Whether to use and or or depends on how you've decided to interpret the sign bit)

The calculation for the all the values consumed by the rasteriser (C,dx,dy) for all 3 sides of a triangle, given the 3 coordinates is here: https://github.com/ralferoo/spugl/blob/db6e22e18fdf3b4338390...

Some of the explanations I wrote down while trying to understand Barycentric coordinates (from which this stuff kind of just falls out of), ended up here: https://github.com/ralferoo/spugl/blob/master/doc/ideas.txt

(Apologies if my memory/terminology is a bit hazy on this - it was a very long time ago now!)

IIRC in terms of performance, this software implementation filling a 720p screen with perspective-correct texture mapped triangles could hit 60Hz using only 1 of the the 7 SPUs, although they weren't overlapping so there was no overdraw. The biggest problem was actually saturating the memory bandwidth, because I wasn't caching the texture data as an unconditional DMA fetch from main memory always completed before the values were needed later in the loop.

It's definitely not "fairly easy" once you get into perspective-correct texture-mapping on the triangles, and making sure the pixels along the diagonal of a quad aren't all janky so they texture has an obvious line across it. Then you add on whatever methods you're using to light/shade it. It gets horrible really quick. To me, at least!

Forgot to add, that when you're calculating these fixed values for each triangle, you can also get the hidden surface removal for free. If you have a constant CW or CCW orientation, the sign of the base value for C tells you whether the triangle is facing towards you or away.