Comment by danpalmer

16 hours ago

> The empirical literature shows that models are particularly vulnerable to naming-related errors like choosing misleading names, reusing names incorrectly, and losing track of which name refers to which value.

I think Vera might be missing something here. In my experience, LLMs code better the less of a mental model you need, vs the more is in text on the page.

Go – very little hidden, everything in text on the page, LLMs are great. Java, similar. But writing Haskell, it's pretty bad, Erlang, not wonderful. You need much more of a mental model for those languages.

For Vera, not having names removes key information that the model would have, and replaces it with mental modelling of the stack of arguments.

26 comments

danpalmer

drob518 13 hours ago

My Spidey sense was tingling when I saw that, too. An additional issue is how humans are supposed to read the code at all so that they can provide help to the LLM if it’s off track. If the code is only usable by models, the models need to be good enough to deal with binary feedback (“Code doesn’t work.”). The human won’t be able to read the code and steer the model. Given the levels of steering required today, that makes me quite nervous.

MarceliusK 2 hours ago

Even if the target author is an LLM, the accountability still lands on humans eventually
jenadine 10 hours ago
I guess the point is that there is no need for humans to read the code.
How often do you read assembly to check what your compiler is doing?
There is a niche of people doing it when they have special constraints, but that's a tiny niche.
- swiftcoder 8 hours ago
  
  > How often do you read assembly to check what your compiler is doing?
  The difference is my compiler is more-or-less deterministic, and tends to do exactly what the specification provided to it (the source code) says. LLMs do not currently fulfil either of those criteria

tryall 36 minutes ago

The FAQ says shuffled names (renaming a variable 'count' to 'result') make LLM perform poorly. But I never seen any codebase have this kind of lies (except from comments). And LLM writing code almost never do that.

Seems like a weird decision taken from a weird paper that make everything harder for humans AND llms. Variables names give useful context when correctly named.

mannykannot 13 hours ago

This will serve as an interesting empirical test, then: will LLMs do better with Vera than with Go or other languages? The testing so far seems inconclusive (https://github.com/aallan/vera-bench), but the authors make this interesting observation:

"No LLM has ever been trained on Vera. There are no Vera examples on GitHub, no Stack Overflow answers, no tutorials — the language was created after these models' training cutoffs. Every token of Vera code in these results was written by a model that learned the language entirely from a single document (SKILL.md [https://veralang.dev/SKILL.md]) provided in the prompt at evaluation time."

If LLMs do much better with Vera (or something like it) than with traditional languages, we may be entering a time when most machine-written code will be difficult for humans to review - but maybe that ship has already sailed.

rapind 16 hours ago

> But writing Haskell, it's pretty bad,

I’m surprised by this. Most likely significant white space is a big part of the problem (LLMs seem horrible at white space). Functional with types has been a win for me with Gleam.

drob518 13 hours ago
But LLMs do Python quite well, so white space isn’t necessarily a problem.
- mannykannot 12 hours ago
  
  Yes - a point supported the Vera benchmark: https://github.com/aallan/vera-bench
  
  1 reply →

MarceliusK 2 hours ago

"Names cause errors" doesn't automatically imply "removing names makes the program easier to generate or reason about"

mkl 7 hours ago

> Go – very little hidden, everything in text on the page, LLMs are great. Java, similar. But writing Haskell, it's pretty bad, Erlang, not wonderful. You need much more of a mental model for those languages.

I don't think that follows. It could just be that there is way more Go and Java code to train on than Haskell and Erlang. Haskell's terseness and symbol-named operators probably don't help either.

robviren 15 hours ago

I too have found the models do well with Go. I will say despite the backwards compatibility guarantee library API changes, what counts as "good" patterns, and new language additions do add some friction to the experience. Almost always works but it can be a bit inconsistent in how the code shows up.

ecthiender 8 hours ago

Hmm, interesting. Are you speaking from experience for Haskell? I'm a Haskell developer since 2017, and have been using LLMs to write code (including Haskell) since 2024. In my experience, LLMs perform much better generating Haskell/Rust code over Python/Javascript.

tasuki 7 hours ago

Same experience. Being able to iterate on compile errors is helpful.

Animats 13 hours ago

The same logic applies to comments. No comments are better than wrong comments.

sornaensis 16 hours ago

I'm curious what issues you had with haskell? I have had the opposite experience and find them dreadful at Java et al.

Surely, denser languages should be better for LLMs?

hgoel 15 hours ago
The context window also limits how deeply the model can "think", and it does this in natural language. So a language suited to LLMs would have balanced density, if it's too dense, the model spends many tokens working through the logic, if it's too sparse, it spends many tokens to read/write the code.
I think in the context of already trained LLMs, the languages most suited to LLMs are also the ones most suited to humans. Besides just having the most code to train on, humans also face similar limitations, if the language is too dense they have to be very careful in considering how to do something, if it's too sparse, the code becomes a pain to maintain.
- cjbgkagh 12 hours ago
  
  I generally agree that humans and LLMs benefit similarly from programming language features. I would tweak that a bit and suggest that their ability floor is higher than the human lowest common denominator so I would skew towards the more advanced human programming languages. There are many typing / analyzer features that would be frustrating for humans to use given they’ll cause the type checking to be slower. This is much less of a problem for LLMs in that they’re very patient and are much better at internalizing the type system so they don’t need to trigger it anywhere nearly as often.
danpalmer 16 hours ago

Density is a double edged sword. On the one hand you want to minimise context usage, but on the other hand more text on the page means more that the LLM can work with.
zem 15 hours ago

my (uninformed) speculation is that you want resilience and error correction, which implies some level of redundancy rather than pure density.

classified 5 hours ago

If it's incomprehensible to humans, it must be perfect for LLMs. Never mind the training.

boxed 9 hours ago

I've found Claude Code to be amazing at Elm, so your comment about Haskell seems strange to me.

changyou 4 hours ago

[flagged]

smohare 16 hours ago

[dead]