Comment by ripvanwinkle

2 years ago

It would be interesting to feed it a formal language specification of some language it hasn't seen and then ask it write code and see how it does.

That could be a test of reasoning and reading comprehension

I've been thinking about a benchmark designed this way for a while. It doesn't even need to be code, particularly, it could be basic reasoning problems. The key is that you define a new, random language that has never before been seen (maybe it has statistical similarity to existing languages, maybe not), create a translation key, then ask a question in that language.

Reasoning vs being a completion engine (I could make a guess at how well that would work)

  • Reasoning is a form of completion (logical), the problem is that LLMs aren't language agnostic in their learned semantic reasoning.