Comment by skissane

2 years ago

> Not that Cobol is obscure exactly--I suppose there's probably quite a bit of it in GPT-4's training considering how pervasive it is in some domains

There is a huge amount of COBOL code in existence – but, almost all of it is non-public code used to run business and governments. Very little of it is publicly source-available (whether open source or something more restrictive than that)

Unless GPT-4's training data includes non-public code bases (I doubt it), it likely has rather little COBOL code in it

I've been using GPT4 to help me navigate a mainframe and a COBOL codebase and it knows far more than what my googling abilities manage to fish up in forums. It's actually surprisingly good at surprisingly deep mainframe topics.

  • No doubt its training data contains a lot of IBM manuals, probably even some commercial books on relevant topics, maybe even the contents of some of the forums you mention – and all that could be enough to correctly answer your questions.

    However, for languages like Python, Java, C, C++, JavaScript, Go, etc, it also contains untold millions of lines of code slurped from places like GitHub. Whereas, I really doubt it contains anywhere remotely near as much COBOL code, just because you look for COBOL code on GitHub public repos, you will find very little – the vast majority of COBOL code is in-house or vendor business software, and few seem to want to make that stuff public – and what COBOL code GitHub has is mostly toy exercises or ancient stuff, not examples of significant contemporary production code. The only way OpenAI is going to get a substantial quantity of that is if multiple private parties (such as banks) give them access to their COBOL code bases – not impossible, but absent some public info saying it has happened, it seems more likely it hasn't.

    I expect GPT-4 (or any LLM) is not going to perform as well on complicated programming tasks for COBOL compared to other languages. For more mainstream languages, it has millions of examples to help it do a better job, for COBOL it likely doesn't.

    • Look, nobody is going to perform as well on complicated programming talks using cobol as with python. But knowing everything you said, I was amazed at how good it was. Try it.

      Also, nobody needs to do complicated coffee talks with cobol, it wasn't meant for it. What we do need a lot of is translating cobol to python or Java.

      4 replies →

But it probably read all the books ever published on COBOL.

  • Almost certainly not. There are heaps of books in libraries which nobody has scanned yet, including many on COBOL. No LLM has read those.

    Whether that makes a difference depends. If you are dealing with mainstream modern COBOL (like recent versions of MicroFocus or IBM Enterprise), it probably won't. If you are dealing with some obscure legacy COBOL dialect, odds are high there is some very helpful printed book which nobody has scanned.