Sure the lower bound is nicer here. But when the tradeoff includes an unlimited upper bound it's not a very attractive option.
I guess the most robust code handling both performance and unexpected input would be one iteration of this (leveraging the assumption that angles are either always within the bounds, or had one frame of going out of bounds by a small amount); followed by a fmod if that assumption is just totally off.
fmod takes in the order of 30+ cycles, probably more in year 2003 CPUs, vs 1 for cmp, 1 for sub, 1 for jmp.
Sure the lower bound is nicer here. But when the tradeoff includes an unlimited upper bound it's not a very attractive option.
I guess the most robust code handling both performance and unexpected input would be one iteration of this (leveraging the assumption that angles are either always within the bounds, or had one frame of going out of bounds by a small amount); followed by a fmod if that assumption is just totally off.