Comment by CuriouslyC

2 years ago

I've been thinking about a benchmark designed this way for a while. It doesn't even need to be code, particularly, it could be basic reasoning problems. The key is that you define a new, random language that has never before been seen (maybe it has statistical similarity to existing languages, maybe not), create a translation key, then ask a question in that language.