Comment by apaprocki

21 days ago

To be fair, I would not expect a model to output perfectly formatted C++. I’d let it output whatever it wants and then run it through clang-format, similar to a human. Even the best humans that have the formatting rules in their head will miss a few things here or there.

If there are 40 years of undocumented business quirks, document them and then re-evaluate. A human new to the codebase would fail under the same conditions.

7 comments

apaprocki

shakna 21 days ago

Formatting isn't just visual, in pre-79 COBOL or Fortran. It's syntax. Its a compile failure, or worse, it cuts the line and can sometimes successfully compile into something else.

Thats not just an undocumented quirk, but a fundamental part of being a punch-card ready language.

raw_anon_1111 21 days ago

With C++ formatting is optional. A better test case for LLMs is Python where indention specifies code blocks. Even ChatGPT 3.5 got the formatting for Python and YAML correct - now the actual code back then was often hilariously wrong.

to11mtm 21 days ago
I can't even get Github Copilot's plugin to avoid randomly trashing files with a Zero No width break space at the beginning, let alone follow formatting rules consistently...
- raw_anon_1111 21 days ago
  
  I am the last person to say anything good about CoPilot. I used CoPilot for a minute, mostly used raw ChatGPT until last month and now use Codex with my personal subscription to ChatGPT and my personal but company reimbursed subscription to Claude.
- sothatsit 21 days ago
  
  > Github Copilot
  Well there’s your issue!
apaprocki 21 days ago
A quick search finds many COBOL checkers. I’d be very surprised if a modern model was not able to fix its own mistakes if connected to a checker tool. Yes, it may not be able to one shot it perfectly, but if it can quickly call a tool once and it “works”, does it really matter much in the end? (Maybe it matters from a cost perspective, but I’m just referring to it solving the problem you asked it to solve.)
Clearly it isn’t just “broken” for everyone, “Claude Code modernizes a legacy COBOL codebase”, from Anthropic:
https://youtu.be/OwMu0pyYZBc
- shakna 21 days ago
  
  Taking Anthropic reporting on Anthropic, at face value, is not something you should really do.
  In this case, a five stage pipeline, built on demo environments and code that were already in the training data, was successful. I see more red flags there, than green.