Comment by fjfaase
7 years ago
For your initial design you could also have chosen to use an additional byte to represent the type of the value. As representing the type would only require 2 or 3 bits, some bits will be unused (probably some more, due to alignment requirements), but maybe later on in the development of the compiler, those bits could be used to store some additional information. That would have made your code a lot simpler.
As you probably want to combine these valuse together into some structures representing the various language constructs, an additional byte to represent the type of the structure, and thus the type of its elements, would also be needed. Than you could do away with the extra bits representing the type.
I just think this is premature optimization and making things unneccessary complex especially for your readers who might want to learn something from it.
The types of user-defined "structures" are usually identified by a tag inside the structure, not encoded in the pointer as for the few primitive types.
You'd probably be surprised how many GC'ed languages actually avoid the whole tagged/boxed variable thing in pursuit of the performance benefits. ocaml for example is limited to 30bit ints for this reason, the haskell standard only guarantees 30bits from the 'Int' implementation for this reason.
As I said further down this thread though, it's a 'speed' thing, not memory storage thing, the common belief is that boxing is slow, because it was slow in java, but in reality boxing is a) a mostly acceptable trade-off that only loses out in extreme cache-limited situations, and b) something that could be optimised away anyway.
My comment concerned heap-allocated user-defined types, not primitive types like int. Also, these techniques of tagging primitive types predate Java, so whatever convinced people that they are needed, Java wasn't it. (Though things change, so yes, it's a possibility that they are no longer needed. Do you have benchmarks?)
I agree that a lot of boxing can be optimized away, but often it also can't.