Comment by mkl

9 months ago

Agreed. Even stranger to me is @ as the fourth most common operator, supposedly more common than +. The whole thing seems dubious.

5 comments

mkl

yorwba 9 months ago

Its number of occurrences is 103,090. In the master's thesis identified as the original source https://cs.uwaterloo.ca/~smwatt/home/students/theses/CSo2005... the Unicode value of the operator occurring 103,090 times is given as 2061, and the thesis helpfully explains that

Unicode 2061, 2062 and 2063 are invisible operators. TeX does not have any of these invisible operators. These invisible operators result from the TEX to MathML conversion.

– 2061 – Function application

– 2062 – Invisible times

– 2063 – Invisible separator

And Wikipedia says that function application may be represented as

U+2061 FUNCTION APPLICATION (⁡, ⁡) — a contiguity operator indicating application of a function; that is an invisible zero width character intended to distinguish concatenation meaning function application from concatenation meaning multiplication. https://en.wikipedia.org/wiki/Function_application#Represent...

I'm not sure though how an automated conversion process would be able to distinguish between these.

dleeftink 9 months ago

The table byline says: "The @ symbol is used to encode mathematical formulas for the computer. It is not visible to the user."

layer8 9 months ago

I would suspect that the @ comes from author email addresses. It's not entirely wrong to call that an operator. ;)

mkl 9 months ago

No, the data (as described in So's thesis) was mathematical expressions extracted from TeX source code, so the surrounding text and email addresses etc. were ignored. Skimming through by eye I can't see @ in any of So's tables, and searching for the hex Unicode value the tables list for every other character yields no hits: @ is not in the tables.
∋ is there anomalously frequently, and @ is missing, so something seems to have gone wrong, probably at multiple stages in the pipeline.
mmooss 9 months ago

Do papers tend to have more email addresses or more plus signs? I'd expect the latter, by a lot.