Comment by pclmulqdq

6 hours ago

If you have a lot of "data plane" code or other looping over data, you can see a big gain from -O3 because of more aggressive unrolling and vectorization (HPC people use -O3 quite a lot). CRUD-like applications and other things that are branchy and heavy on control flow will often see a mild performance regression from use of -O3 compared to -O2 because of more frequent frequency hits due to AVX instructions and larger binary size.