← Back to context

Comment by antonvs

3 months ago

Your intuition is on the right track. The distinction may become clearer if you consider a classic language implementation design:

1. There's a lexer which breaks source text up into a stream of tokens.

2. A parser which converts a stream of tokens into an abstract syntax tree (AST).

3. An interpreter or compiler that traverses the AST in order to either execute it (interpreter) or transform it into some other form (compiler).

Points 1 & 2 are syntax - the mapping between the textual form of a program and its meaning.

Point 3 is semantics - how a program actually behaves, or as you say, what its terms evaluate to.

Looking at it like this can give a sharp line between syntax and semantics. But when you get deeper into it, it turns out that with some languages at least, you can get from source syntax to something that actually executes - has behavior, i.e. semantics - with nothing but a series of syntactic transformations. From that perspective, you can say that semantics can be defined as a sequence of syntactic transformations.

This doesn't erase the distinction between syntax and semantics, though. The syntax of the source language is the first stage in a chain of transformations, with each stage involving a different (albeit closely related) language with its own syntactic structure.

> Explanations that involve 'giving a precise mathematical meaning' just seem almost circular to me.

Formal semantics covers this, but the syntax/semantics distinction isn't necessary just formal - it's a useful distinction even in an informal sense.

As for circularity, it's absolutely the case that formal semantics is nothing more than defining one language in terms of another. But the idea is that the target language is supposed to be one with well-defined semantics of its own, which is why "mathematical" comes up so often in this context. Mathematical abstractions such as set theory, lattice theory, lambda calculus and so on can provide a rigorous foundation that's more tractable when it comes to proofs.

That kind of circularity pervades our knowledge in general. Words in the dictionary are given meaning in terms of other words. You can't explain something without explaining it in terms of something else.

Great explanation. Also, if I understand things correctly, type-checking is where things get really interesting. Runtime errors occur in #3. Type checkers identify these errors (to varying degrees) and they show the errors in the compile phase. If we think of parsing and type-checking as a unit together, then type-checking sort of pushes the line of syntax further into semantic territory. The stronger the type-checker, the more you can make semantic errors look like syntax errors.

This is similar to what Chomsky did in "Aspects of the Theory of Syntax" when he tried to show how we can build more thorough systems to evaluate semantics of (human) language, like what kinds of nouns can go with what kinds of verbs. He pushes the line of syntax further into the semantic territory and tries to create more comprehensive sets of rules that better guarantee the generation of syntactically and semantically correct sentences. I think this is perfectly analogous to the type-checking enterprise.

  • > If we think of parsing and type-checking as a unit together

    This certainly leads to a blurring of the distinction, but that's a result of the choice of this as a premise.

    Parsing will give you an AST that tells you that a term has a type, say, `Int -> Bool`, which might be represented e.g. as a 3-node tree with the arrow at the root and the input and output types as leaves. But falling back to the conceptual definition of syntax, this tells you nothing about what that tree means.

    To add type checking into the picture, we need another stage between 2 and 3, which is where meaning is assigned to types in the AST and the semantics of the types are handled.

    You'll often see people saying things about how types are syntactic, but this is a slightly different use of the word: basically, it refers to how types categorize syntactic terms. So types apply to syntax, but their behavior when it comes to actual checking is still semantics - it involves applying logic, inference etc. that go well beyond syntactic analysis in the parsing sense.

    Really, it boils down to what you choose as your definitions. If you define syntax to be, essentially, the kind of thing that's modeled by an AST, then there's not really any ambiguity, which makes it a good definition, I think. Semantics is then assigning meaning to the nodes and relationships in such a tree.

    Re Chomsky, I think the discovery that PL semantics can be entirely implemented in terms of a series of syntactic transformation is quite relevant to what Chomsky was getting at. But that doesn't negate semantics, it just gives us a more precise model for semantics. In fact, I have a sneaking suspicion that this formally-supported view might help clarify Chomsky's work, but a bit more work would be needed to demonstrate that. :)

Correction:

> Points 1 & 2 are syntax - the mapping between the textual form of a program and its structure.