Comment by pama
2 years ago
Since the LLM sometimes generates invalid COBOL a simple practical solution would be to use an API and allow it to test its code with GnuCOBOL, feed the output and have it try again a couple of times. I wonder what the updated benchmarks would be in that setting.
The general approach seems to work anyway. I tried it out with ChatGPT 3.5 and an online Cobol compiler[0], manually feeding back the output, and it managed to produce a working program on the 10th attempt (that displays the first 10 Fibonacci numbers).
Edit: Well, maybe. With the example from the article it wasn't as successful.
[0] https://onecompiler.com/cobol/