Math Symbol Frequencies

6 months ago (leancrew.com)

It seems weird that ∋ would be the sixth-most frequent symbol, while ∈ doesn't figure at all.

  • Agreed. Even stranger to me is @ as the fourth most common operator, supposedly more common than +. The whole thing seems dubious.

    • Its number of occurrences is 103,090. In the master's thesis identified as the original source https://cs.uwaterloo.ca/~smwatt/home/students/theses/CSo2005... the Unicode value of the operator occurring 103,090 times is given as 2061, and the thesis helpfully explains that

      Unicode 2061, 2062 and 2063 are invisible operators. TeX does not have any of these invisible operators. These invisible operators result from the TEX to MathML conversion.

      – 2061 – Function application

      – 2062 – Invisible times

      – 2063 – Invisible separator

      And Wikipedia says that function application may be represented as

      U+2061 FUNCTION APPLICATION (⁡, ⁡) — a contiguity operator indicating application of a function; that is an invisible zero width character intended to distinguish concatenation meaning function application from concatenation meaning multiplication. https://en.wikipedia.org/wiki/Function_application#Represent...

      I'm not sure though how an automated conversion process would be able to distinguish between these.

    • The table byline says: "The @ symbol is used to encode mathematical formulas for the computer. It is not visible to the user."

  • There definitely is some sort of methodological problem. It thinks \otimes is more than 4 times more frequent than the plain good old fashioned integral sign. There’s absolutely no way that is the case.

A related report from way back, that counts expressions instead of symbols[0]. The counting procedure used in OP's referenced table might benefit from first extracting expressions, and then counting individual symbol frequencies.

[0]: Watt, S. M. A Preliminary Report on the Set of Symbols Occurring in Engineering Mathematics Texts. In Proceedings of MICA 2008: Milestones in Computer Algebra 2008.

I had a bit of a chuckle that apparently 5 out of 50000 opening "(" parentheses weren't closed, but then I saw that 2 out of 12000 "]" brackets weren't opened! What criminal is using these standalone?

I wonder if these tables are telling us that it's more conventional to write "a < b" as opposed to "b > a". Is there a style guide for writing equations?

  • Could be. We tend to think of a number line going in that order, that is, the lower numbers are to the left. What is interesting is that being > 0 is often a condition, such as epsilon > 0. Though that is often paired with something like 0 < |x-a| < epsilon. I have often wondered about an alternate mathematics in which the inequality sign was always pointed in the same direction and whether that would ease the difficulty students have with inequalities.