← Back to context

Comment by danpalmer

16 hours ago

> The empirical literature shows that models are particularly vulnerable to naming-related errors like choosing misleading names, reusing names incorrectly, and losing track of which name refers to which value.

I think Vera might be missing something here. In my experience, LLMs code better the less of a mental model you need, vs the more is in text on the page.

Go – very little hidden, everything in text on the page, LLMs are great. Java, similar. But writing Haskell, it's pretty bad, Erlang, not wonderful. You need much more of a mental model for those languages.

For Vera, not having names removes key information that the model would have, and replaces it with mental modelling of the stack of arguments.

My Spidey sense was tingling when I saw that, too. An additional issue is how humans are supposed to read the code at all so that they can provide help to the LLM if it’s off track. If the code is only usable by models, the models need to be good enough to deal with binary feedback (“Code doesn’t work.”). The human won’t be able to read the code and steer the model. Given the levels of steering required today, that makes me quite nervous.

  • I guess the point is that there is no need for humans to read the code.

    How often do you read assembly to check what your compiler is doing?

    There is a niche of people doing it when they have special constraints, but that's a tiny niche.

    • > How often do you read assembly to check what your compiler is doing?

      The difference is my compiler is more-or-less deterministic, and tends to do exactly what the specification provided to it (the source code) says. LLMs do not currently fulfil either of those criteria

The FAQ says shuffled names (renaming a variable 'count' to 'result') make LLM perform poorly. But I never seen any codebase have this kind of lies (except from comments). And LLM writing code almost never do that.

Seems like a weird decision taken from a weird paper that make everything harder for humans AND llms. Variables names give useful context when correctly named.

This will serve as an interesting empirical test, then: will LLMs do better with Vera than with Go or other languages? The testing so far seems inconclusive (https://github.com/aallan/vera-bench), but the authors make this interesting observation:

"No LLM has ever been trained on Vera. There are no Vera examples on GitHub, no Stack Overflow answers, no tutorials — the language was created after these models' training cutoffs. Every token of Vera code in these results was written by a model that learned the language entirely from a single document (SKILL.md [https://veralang.dev/SKILL.md]) provided in the prompt at evaluation time."

If LLMs do much better with Vera (or something like it) than with traditional languages, we may be entering a time when most machine-written code will be difficult for humans to review - but maybe that ship has already sailed.

"Names cause errors" doesn't automatically imply "removing names makes the program easier to generate or reason about"

> Go – very little hidden, everything in text on the page, LLMs are great. Java, similar. But writing Haskell, it's pretty bad, Erlang, not wonderful. You need much more of a mental model for those languages.

I don't think that follows. It could just be that there is way more Go and Java code to train on than Haskell and Erlang. Haskell's terseness and symbol-named operators probably don't help either.

I too have found the models do well with Go. I will say despite the backwards compatibility guarantee library API changes, what counts as "good" patterns, and new language additions do add some friction to the experience. Almost always works but it can be a bit inconsistent in how the code shows up.

Hmm, interesting. Are you speaking from experience for Haskell? I'm a Haskell developer since 2017, and have been using LLMs to write code (including Haskell) since 2024. In my experience, LLMs perform much better generating Haskell/Rust code over Python/Javascript.

I'm curious what issues you had with haskell? I have had the opposite experience and find them dreadful at Java et al.

Surely, denser languages should be better for LLMs?

  • The context window also limits how deeply the model can "think", and it does this in natural language. So a language suited to LLMs would have balanced density, if it's too dense, the model spends many tokens working through the logic, if it's too sparse, it spends many tokens to read/write the code.

    I think in the context of already trained LLMs, the languages most suited to LLMs are also the ones most suited to humans. Besides just having the most code to train on, humans also face similar limitations, if the language is too dense they have to be very careful in considering how to do something, if it's too sparse, the code becomes a pain to maintain.

    • I generally agree that humans and LLMs benefit similarly from programming language features. I would tweak that a bit and suggest that their ability floor is higher than the human lowest common denominator so I would skew towards the more advanced human programming languages. There are many typing / analyzer features that would be frustrating for humans to use given they’ll cause the type checking to be slower. This is much less of a problem for LLMs in that they’re very patient and are much better at internalizing the type system so they don’t need to trigger it anywhere nearly as often.

  • Density is a double edged sword. On the one hand you want to minimise context usage, but on the other hand more text on the page means more that the LLM can work with.

  • my (uninformed) speculation is that you want resilience and error correction, which implies some level of redundancy rather than pure density.

I've found Claude Code to be amazing at Elm, so your comment about Haskell seems strange to me.