Comment by idle_zealot
11 hours ago
I suspect that the compact nature of the syntax would require more tokens spent "thinking" to get decent results. It might be more efficient for simple code though. Either way worth testing. Surely someone must've set up a "how well LLMs handle Xlang" benchmark suite.
I haven't seen such a benchmark although maybe it exists.
As far as benchmarks go, I'd also like to see benchmarks that try to find what LLMs are good at. Most of the benchmarks seem designed to give LLMs hard problems and see if they can succeed. In that sense a "good" benchmark is one with a low pass rate.
But if we're going to do agentic coding we also need to know the opposite. We need to know which types of tasks given in which format LLMs will succeed at with like 95%+ accuracy. Then we can more easily build multi prompt pipelines with high confidence in each step.
I think the main reason more time might be spent thinking is because there's relatively less training data on Haskell out in the wild, meaning an agent may have to check back and forth with static analysis to figure out what's valid.
Compact syntax is generally only a good thing for LLMs because it saves context windows and tokens.