>Secondly, if you think any compiler is meaningfully doing anything optimal >>("whole program analysis") on a TU scale greater than say ~50kloc (ie ~10 files) >relative to compiling individually you're dreaming.
That's wrong. gcc generates summaries of function properties and propagate those up and down the call tree, which for LTO is then build in a distributed way. It does much more than mere inlining, but even advanced analysis like points to analysis.
> if you think any compiler is meaningfully doing anything optimal ("whole program analysis") on a TU scale greater than say ~50kloc (ie ~10 files) relative to compiling individually you're dreaming.
You can build the Linux kernel with LTO: simply diff the LTO vs non-LTO outputs and it will be obvious you're wrong.
>Secondly, if you think any compiler is meaningfully doing anything optimal >>("whole program analysis") on a TU scale greater than say ~50kloc (ie ~10 files) >relative to compiling individually you're dreaming.
That's wrong. gcc generates summaries of function properties and propagate those up and down the call tree, which for LTO is then build in a distributed way. It does much more than mere inlining, but even advanced analysis like points to analysis.
https://gcc.gnu.org/onlinedocs/gccint/IPA.html https://gcc.gnu.org/onlinedocs/gccint/IPA-passes.html
It scales to millions of lines of code because it's partioned.
> if you think any compiler is meaningfully doing anything optimal ("whole program analysis") on a TU scale greater than say ~50kloc (ie ~10 files) relative to compiling individually you're dreaming.
You can build the Linux kernel with LTO: simply diff the LTO vs non-LTO outputs and it will be obvious you're wrong.
SQLite3 may be a counter-example:
https://sqlite.org/amalgamation.html