Comment by WalterBright
4 years ago
> I use lots of characters that look like ASCII but are in fact not ASCII but nonetheless accepted as valid identifier characters.
Clever, I was wondering how the : was done, but it's an abomination :-/
With some simple improvements to the language, about 99% of the C preprocessor use can be abandoned and deprecated.
Walter, D has conditional compilation, versioning and CTFE without preprocessor so I guess that covers the 99% "sane" functionality. Where do you draw the line between that and the 1% abomination part, i.e. your thoughts on, say, compile time type introspection and things like generating ('printing') types/declarations?
The abomination is using the preprocessor to redefine the syntax and/or invent new syntax. Supporting identifier characters that look like `:` is just madness.
Of course, I've also opined that Unicode supporting multiple encodings for the same glyph is also madness. The Unicode people veered off the tracks and sank into a swamp when they decided that semantic information should be encoded into Unicode characters.
What other kind of difference should be encoded into Unicode characters? For example, the glyphs for the Latin a and the Cyrillic а, or the Latin i and the Cyrillic (Ukrainian, Belarusian, and pre-1918 Russian) і look identical in practically every situation, and the Latin (Turkish) ı and the Greek ι aren’t far off. At least not far off compared to the Cyrillic (most languages) д and the Cyrillic (Southern) g-like version (from the standard Cyrillic cursive), or the Cyrillic т and the several Cyrillic (Southern) versions that are like either an m or a turned m (from the cursive, again). Yet most people who are acquainted with the relevant languages would say the former are different “letters” (whatever that means) and the latter are the same.
[Purely-Latin borderline cases: umlaut (is not two dots in Fraktur) vs diaeresis (languages that use it are not written in Fraktur), acute (non-Polish, points past the letter) vs kreska (Polish, points at the letter). On the other hand, the mathematical “element of” sign was still occasionally typeset as an epsilon well into the 1960s.]
Unicode decides most of these based on the requirement to roundtrip legacy encodings (“have these been ever encoded differently in the same encoding?”), which seems reasonable, yet results in homograph problems and at the same time the Turkish case conversion botch. In any case, once (sane) legacy encodings run out but you still want to be consistent, what do you base the encoding decisions on but semantics? (On the other hand, once you start encoding semantic differences, where do you stop?..) You could do some sort of glyph-equivalence-class thing, but that would still give you no way to avoid unifying a and а—everyone who writes both writes them the same.
None of this touches on Unicode “canonical equivalence”, but your claim (“Unicode supporting multiple encodings for the same glyph is [...] madness”) covers more than just that if I understood it correctly. And while I am attacking it in a sense, it’s only because I genuinely don’t see how this part could have been done differently in a major way.
7 replies →
That ship sailed long before Unicode. Even ASCII has characters with multiple valid glyphs (lower case a can lose the ascender, and lower case g is similarly variable in the number of loops), not to mention multiple characters that are often represented with the same glyph (lower case l, upper case I, digit 1).
1 reply →
> The Unicode people veered off the tracks and sank into a swamp when they decided that semantic information should be encoded into Unicode characters.
As if that weren't enough, they also decided to cram half-assed formatting into it. You got bold letters, italics, various fancy-style letters, superscripts and subscripts for this and that.. all for the sake of leagacy compatibility. Unicode was legacy right from the beginning.
8 replies →
To clarify, what is needed are:
1. static if conditionals
2. version conditionals
3. assert
4. manifest constants
5. modules
I occasionally find macro usages that would require templates, but these are rare.
One other thing that would be great that sometimes people use the preprocessor for is having the names variables/enums as runtime strings. Like, if you have an enum and a function to get the string representation for debug purposes (i.e. the name of the enum as represented inside the source code):
you can use various preprocessor tricks to implement getEnumName such that you don't have to change it when adding more cases to the enum. This would be much better implemented with some compiler intrinsic/operator like `nameof(val)` that returned a string. C# does something similar with its `nameof`.
> you can use various preprocessor tricks to implement getEnumName such that you don't have to change it when adding more cases to the enum.
For those who don’t know: the X Macro (https://en.wikipedia.org/wiki/X_Macro, https://digitalmars.com/articles/b51.html)
2 replies →
I like that ONE == 0.
1 reply →
> With some simple improvements to the language, about 99% of the C preprocessor use can be abandoned and deprecated.
Arguably the C feature most used in other languages is the C preprocessor's conditional compilation for e.g. different OSes. Used by languages from Fortran (yes, there exists FPP now - for a suitable definition of 'now') to Haskell (yes, `{-# LANGUAGE CPP #-}`).
In C++, anyway. C’s expressiveness, on the other hand, is pretty weak, and a preprocessor is very useful there.
A better preprocessor (a C code generator, effectively) would be a simple program that would interpret the <% and %> brackets or similar (by “inverting” them). It is very powerful paradigm.
You're talking about metaprogramming. I've seen C code that does metaprogramming with the preprocessor.
If you want to use metaprogramming, you've outgrown C and should consider a more powerful language. There are plenty to pick from. DasBetterC, for example.
But the <%-preprocessor would be the most powerful metaprogramming tool, would it not? Simply because the programmer would have at their disposal the power of the entire real programming language as opposed to being limited to whatever template language happens to be built in. For instance, if I want to generate a piece of code to define an enum and, at the same time, have a method to serialize it (say, into XML), then with <% it is a trivial task, whereas in C# I need to define and use some weird "attribute" class, while C++ offers me no way whatsoever to accomplish this, with all its metaprogramming power. Is D different in this regard?
2 replies →