← Back to context

Comment by themafia

1 month ago

> it should have been a postfix operator, like in Euler and Pascal.

I never liked Pascal style Pointer^. As the postfix starts to get visually cumbersome with more than one layer of Indirection^^. Especially when combined with other postfix Operators^^.AndMethods. Or even just Operator^ := Assignment.

I also think it's the natural inverse of the "address-of" prefix operator. So we have "take the address of this value" and "look through the address to retreive the value."

The "natural inverse" relationship between "address-of" and indirect addressing is only partial.

You can apply the "*" operator as many times you want, but applying "address-of" twice is meaningless.

Moreover, in complex expressions it is common to mix the indirection operator with array indexing and with structure member selection, and all these 3 postfix operators can appear an unlimited number of times in an expression.

Writing such addressing expressions in C is extremely cumbersome, because they require a great number of parentheses levels and it is still difficult to see which is the order in which they are applied.

With a postfix indirection operator no parentheses are needed and all addressing operators are executed in the order in which they are written.

So it is beyond reasonable doubt that a prefix "*" is a mistake.

The only reason why they have chosen "*" as prefix in C, which they later regretted, was because it seemed easier to define the expressions "*++p" and "*p++" to have the desired order of evaluation.

There is no other use case where a prefix "*" simplifies anything and for the postfix and prefix increment and decrement it would have been possible to find other ways to avoid parentheses and even if they were used with parentheses that would still have been simpler than when you have to mix "*" with array indexing and with structure member selection. Moreover, the use of "++" and "--" with pointers was only a workaround for a dumb compiler, which could not determine by itself whether it should access an array using indices or pointers. Normally there should be no need to expose such an implementation detail in a high-level language, the compiler should choose the addressing modes that are optimal for the target CPU, not the programmer. On some CPUs, including the Intel/AMD CPUs, accessing arrays by incrementing pointers, like in the old C programs, is usually worse than accessing the arrays through indices (because on such CPUs the loop counter can be reused as an index register, regardless of the order in which the array is accessed, including for accessing multiple arrays, avoiding the use of extra registers and reducing the number of executed instructions).

With a postfix "*", the operator "->" would have been superfluous. It has been added to C only to avoid some of the most frequent cases when a prefix "*" leads to ugly syntax.

  • > You can apply the "*" operator as many times you want, but applying "address-of" twice is meaningless.

    This is due to the nature of lvalue and rvalue expressions. You can only get an object where * is meaningful twice if you've applied & meaningfully twice before.

        int a = 42;
        int *b = &a;
        int **c = &b;
    

    I've applied & twice. I merely had to negotiate with the language instead of the parser to do so.

    > and all these 3 postfix operators can appear an unlimited number of times in an expression.

    In those cases the operator is immediately followed by a non-operator token. I cannot meaningfully write a[][1], or b..field.

    > The only reason why they have chosen "*" as prefix in C, which they later regretted, was because it seemed easier to define the expressions "++p" and "p++" to have the desired order of evaluation.

    It not only seems easier it is easier. What you sacrifice is complications is defining function pointers. One is far more common than the other. I think they got it right.

    > With a postfix "*", the operator "->" would have been superfluous.

    Precisely the reason I dislike the Pascal**.Style. Go offers a better mechanism anyways. Just use "." and let the language work out what that means based on types.

    I'm offering a subjective point of view. I don't like the way that looks or reads or mentally parses. I'm much happier to occasionally struggle with function pointers.

  • > The only reason why they have chosen "" as prefix in C, which they later regretted, was because it seemed easier to define the expressions "++p" and "*p++" to have the desired order of evaluation.

    There has been no shortage of speculation, much of it needlessly elaborate. The reality, however, appears far simpler – the prefix pointer notation had already been present in B and its predecessor, BCPL[0]. It was not invented anew, merely borrowed – or, more accurately, inherited.

    The common lore often attributes this syntactic feature to the influence of the PDP-11 ISA. That claim, whilst not entirely baseless, is at best a partial truth. The PDP-11 did support pre-increment and post-increment indirect address manipulation – but notably lacked their symmetrical complements: pre-increment and post-decrement addressing modes[1]. In other words, it exhibited asymmetry – a gap that undermines the argument for direct PDP-11 ISA inheritance, i.e.

      MOV (Rn)+, Rm
    
      MOV @(Rn)+, Rm
    
      MOV -(Rn), Rm
    
      MOV @-(Rn), Rm
    

    existed but not

      MOV +(Rn), Rm
    
      MOV @+(Rn), Rm
    
      MOV (Rn)-, Rm
    
      MOV @(Rn)-, Rm
    

    [0] https://www.thinkage.ca/gcos/expl/b/manu/manu.html#Section6_...

    [1] PDP-11 ISA allocates 3 bits for the addressing mode (register / Rn, indirect register (Rn), auto post-increment indirect / (Rn)+ , auto post-increment deferred / @(Rn)+, auto pre-decrement indirect / -(Rn), auto pre-increment deferred / @-(Rn), index / idx(Rn) and index deferred / @idx(Rn) ), and whether it was actually «let's choose these eight modes» or «we also wanted pre-increment and post-decrement but ran out of bits» is a matter of historical debate.

    • The prefix "*" and the increment/decrement operators have been indeed introduced in the B language (in 1969, before the launch of PDP-11 in 1970, but earlier computers had some autoincrement/autodecrement facilities, though not as complete as in the B language), where "*" has been made prefix for the reason that I have already explained.

      The prefix "*" WAS NOT inherited from BCPL, it was purely a B invention due to Ken Thompson.

      In BCPL, "*" was actually a postfix operator that was used for array indexing. It was not the operator for indirection.

      In CPL, the predecessor of BCPL, there was no indirection operator, because indirection through a pointer was implicit, based on the type of the variable. Instead of an indirection operator, there were different kinds of assignment operators, to enable the assignment of a value to the pointer, instead of assigning to the variable pointed by the pointer, which was the default meaning.

      BCPL has made many changes in the syntax of CPL, whose main reason was the necessity of adapting the language to the impoverished character set available on American computers, which lacked many of the characters that had been available in Europe before IBM and a few other US vendors have succeeded to replace the local vendors, also imposing thus the EBCDIC and later the ASCII character sets.

      Several of the changes done between BCPL and B had the same kind of reason, i.e. they were needed to transition the language from an older character set to the then new ASCII character set. For instance the use of braces as block delimiters was prompted by their addition into ASCII, as they were not available in the previous character set.

      The link that you have provided to a manual of the B language is not useful for historical discussions, as the manual is for a modernized version of B, which contains some features back-ported from C.

      There is a manual of the B language dated 1972-01-07, which predates the C language, and which can be found on the Web. Even that version might have already included some changes from the original B language of 1969.

      4 replies →

  • A postfix "*" would be completely redundant since you can just use p[0] . Instead of *p++ you'd have (p++)[0] - still quite workable.

    • You're kidding, right? (p++)[0] returns the contents of (p) before the ++. Its hard to imagine a more confusing juxtaposition.

A dash instead of a dot would be so much more congruent with the way Latin script generally render compounded terms. And a reference/pointer (or even pin for short) is really nothing that much different compared to any other function/operator/method.

some·object-pin-pin-pin-transform is not harder to parse nor to interpret as human than (***some_object)->transform().