Comment by apaprocki

4 hours ago

To be fair, I would not expect a model to output perfectly formatted C++. I’d let it output whatever it wants and then run it through clang-format, similar to a human. Even the best humans that have the formatting rules in their head will miss a few things here or there.

If there are 40 years of undocumented business quirks, document them and then re-evaluate. A human new to the codebase would fail under the same conditions.

Formatting isn't just visual, in pre-79 COBOL or Fortran. It's syntax. Its a compile failure, or worse, it cuts the line and can sometimes successfully compile into something else.

Thats not just an undocumented quirk, but a fundamental part of being a punch-card ready language.

With C++ formatting is optional. A better test case for LLMs is Python where indention specifies code blocks. Even ChatGPT 3.5 got the formatting for Python and YAML correct - now the actual code back then was often hilariously wrong.

  • A quick search finds many COBOL checkers. I’d be very surprised if a modern model was not able to fix its own mistakes if connected to a checker tool. Yes, it may not be able to one shot it perfectly, but if it can quickly call a tool once and it “works”, does it really matter much in the end? (Maybe it matters from a cost perspective, but I’m just referring to it solving the problem you asked it to solve.)

    Clearly it isn’t just “broken” for everyone, “Claude Code modernizes a legacy COBOL codebase”, from Anthropic:

    https://youtu.be/OwMu0pyYZBc

  • I can't even get Github Copilot's plugin to avoid randomly trashing files with a Zero No width break space at the beginning, let alone follow formatting rules consistently...

    • I am the last person to say anything good about CoPilot. I used CoPilot for a minute, mostly used raw ChatGPT until last month and now use Codex with my personal subscription to ChatGPT and my personal but company reimbursed subscription to Claude.