Comment by pama

2 years ago

Since the LLM sometimes generates invalid COBOL a simple practical solution would be to use an API and allow it to test its code with GnuCOBOL, feed the output and have it try again a couple of times. I wonder what the updated benchmarks would be in that setting.

The general approach seems to work anyway. I tried it out with ChatGPT 3.5 and an online Cobol compiler[0], manually feeding back the output, and it managed to produce a working program on the 10th attempt (that displays the first 10 Fibonacci numbers).

Edit: Well, maybe. With the example from the article it wasn't as successful.

[0] https://onecompiler.com/cobol/