Comment by masklinn

3 months ago

OP is really tying themselves into knots.

Symbols are special cased pseudo-strings for langages which have either extremely inefficient strings (erlang) or mutable ones (ruby). That’s about the extent of it.

Python or Java don’t really have a use for them because their strings are immutable, and likely interned when metaprogramming is relevant (e.g. method or attribute names).

Whenever Perl encounters a string literal in the code (especially one used as a hash key or a bareword), it often "interns" it. This means it stores a single, canonical, read-only copy of that string in a memory pool.

That's the core idea, and then Ruby has surface stuff like the symbol syntax and class. I'm pretty sure it's fine to use strings as hash keys though if you like.

  • > I'm pretty sure it's fine to use strings as hash keys though if you like.

    Sure. They are just less efficient as hash keys.

    Although now the distinction blurs with frozen strings (and the string literals being frozen by default switch).

    • Ruby has always had frozen strings (what it didn't have was interning of string literals, which is what the somewhat-poorly-named "# frozen_string_literal: true" option available from Ruby 2.3 and made default in Ruby 3.4 actually does, which makes string literals basically equivalent to symbol—but, not actually symbols, unlike, in another example of suboptimal naming, what would happen with String#intern, which has existed longer to intern strings, but is actually just an alias of String#to_sym.

Symbol is the only feature I miss after switching to Python. It makes code so much more readable to distinguish keys and normal strings.

  • Much like yccs27 above, I do that using single and double quoted strings (a habit I got from Erlang).

Could not read OP, clownflare is down.

> Symbols are pseudo-strings

Can guess by LISP: Symbols reside in the symbol table, which is global scope. Strings reside in the variable tables, subject to local scope.

It is two different storage mechanisms

> inefficient strings

Ruby does not care for efficiency, so there is no point to argue for symbols vs string performance

  • > Ruby does not care for efficiency, so there is no point to argue for symbols vs string performance

    Symbols existed entirely for performance reasons and were once never GC'd: this is absolutely Ruby "car[ing] for efficiency."

    Today, Ruby's Symbol is GC'd (and is much closer to 'String' in practicality) but still has enormous impacts on performance.

  • > It is two different storage mechanisms

    An irrelevant implementation detail. Interned strings are also stored globally, and depending on implementations interned strings may or may not be subject to memory reclaiming.

    > Ruby does not care for efficiency, so there is no point to argue for symbols vs string performance

    Which is why Ruby's having symbols is associated with mutable strings, not inefficient strings.

    And there's a gulf between not caring too much about efficiency and strings being a linked list of integers, which is what they are in Erlang.

Lisps have unnterned symbols also.

Interning is important for symbols that are involved in I/O: being printed and read, so that two or more occurrences of the same symbol in print will all read to the same object, and so there is print-read consistency: we can print an interned symbol, and then read the printed symbol to obtain the same object.

Symbols are useful without this also. Symbolic processing that doesn't round trip symbols to a printed notation and back doesn't require interned symbols.

Symbols have a name, but are not that name.

They also have various properties, which depends on the Lisp dialect.

Classical Lisp dialects, like MacCarthy's original, endow each symbol with a property list. Another classical property is the "value cell".

In Common Lisp lists have a home package retrieved by the function symbol-package.

A variable being globally proclaimed special can be implemented as a property of the symbol.

Symbols are interned in packages, not globally, so two symbols can be interned, yet have the same name: mypackage:let and cl:let both have the name "LET", but different home packages.

Uninterned symbols with the same name can be readily made: just call (make-symbol "FOO") twice and you get two symbols named "FOO", which print as #:FOO.

The #: notation means symbol with no home package, used as a proxy for "uninterned", though a perverse situation can be contrived whereby a symbol has no home package (and so prints with the #: notation), yet is interned into a package.

Introduce a FOO symbol in the keyword package:

  [1]> :foo
  :FOO

Now import it into the CL-USER package:

  [2]> (import :foo :cl-user)
  T

Verify that cl-user::foo is actually the keyword symbol :foo:

  [3]> 'cl-user::foo
  :FOO

Now, unintern :foo from the keyword package, leaving it homeless:

  [4]> (unintern :foo :keyword)
  T

Let's print it, accessing it via the cl-user package where it is still interned by import:

  [5]> 'cl-user::foo
  #:FOO

There is quite a bit to this symbol stuff than just "interned strings".

Symbols are simply not strings objects; they have strings as a name.

That’s really not true for Lisp.

Ruby, like its predecessor Perl, is one of the finer examples of Greenspunning and shows a lot of Lisp influence.

Unfortunately I can’t read the actual submission right now due to the cloudflare outage.

> extremely inefficient strings (erlang)

Doesn't most modern Erlang code use binaries instead of charlists? Elixir and Gleam certainly do.

This is silly. The semantics are entirely different!

  • How so? Quite literally symbols are used as an immutable string with a shorter syntax. So much so that I've been finding their literal constraints limiting lately.

    • Almost the entire value of symbols separate from strings is at the level of programmer communication rather than PL semantics.

      It tells a reader of the code that this term is arbitrary but significant, probably represents an enmeshment with another part of the code, and will not be displayed to any user. When seeing a new term in code that is a lot of the things you're going to need to figure out about it anyway. It's a very valuable & practical signal.

      If you need to mutate or concat or interpolate or capitalize or any other string operation it, it probably shouldn't be a symbol anymore, or shouldn't have been to start with.

      2 replies →

  • > This is silly.

    Oh my bad, great counterpoint.

    > The semantics are entirely different!

    They're not. A symbol is an arbitrary identifier, which can be used to point to system elements (e.g. classes, methods, etc...). These are all things you can do just fine with immutable interned strings. Which is exactly what languages which have immutable interned strings do.

    You'd just have a broken VM if you used mutable strings for metaprogramming in Ruby, so it needs symbols. Both things it inherited from all of Perl, Smalltalk, and Lisp.