Comment by alexpham14

1 month ago

Compliance is usually the hard stop before we even get to capability. We can’t send code out, and local models are too heavy to run on the restricted VDI instances we’re usually stuck with. Even when I’ve tried it on isolated sandbox code, it struggles with the strict formatting. It tends to drift past column 72 or mess up period termination in nested IFs. You end up spending more time linting the output than it takes to just type it. It’s decent for generating test data, but it doesn't know the forty years of undocumented business logic quirks that actually make the job difficult.

13 comments

alexpham14

apaprocki 1 month ago

To be fair, I would not expect a model to output perfectly formatted C++. I’d let it output whatever it wants and then run it through clang-format, similar to a human. Even the best humans that have the formatting rules in their head will miss a few things here or there.

If there are 40 years of undocumented business quirks, document them and then re-evaluate. A human new to the codebase would fail under the same conditions.

shakna 1 month ago

Formatting isn't just visual, in pre-79 COBOL or Fortran. It's syntax. Its a compile failure, or worse, it cuts the line and can sometimes successfully compile into something else.
Thats not just an undocumented quirk, but a fundamental part of being a punch-card ready language.
raw_anon_1111 1 month ago
With C++ formatting is optional. A better test case for LLMs is Python where indention specifies code blocks. Even ChatGPT 3.5 got the formatting for Python and YAML correct - now the actual code back then was often hilariously wrong.
- to11mtm 1 month ago
  
  I can't even get Github Copilot's plugin to avoid randomly trashing files with a Zero No width break space at the beginning, let alone follow formatting rules consistently...
  
  2 replies →
- apaprocki 1 month ago
  
  A quick search finds many COBOL checkers. I’d be very surprised if a modern model was not able to fix its own mistakes if connected to a checker tool. Yes, it may not be able to one shot it perfectly, but if it can quickly call a tool once and it “works”, does it really matter much in the end? (Maybe it matters from a cost perspective, but I’m just referring to it solving the problem you asked it to solve.)
  Clearly it isn’t just “broken” for everyone, “Claude Code modernizes a legacy COBOL codebase”, from Anthropic:
  https://youtu.be/OwMu0pyYZBc
  
  1 reply →

akhil08agrawal 1 month ago

Nuances of a codebase are the key. But I guess we are accelerating towards solving that. Let's see how much time will this take.

layer8 1 month ago
The critical “why” knowledge often cannot be derived from the code base.
The prohibitions on other companies (LLM providers) being able to see your code also won’t be going away soon.
- Muromec 1 month ago
  
  Other companies can see the code, that isn’t a problem. The problem with LLM is the idea that the code leaks out to companies other than LLM provider.
  That’s something that can be either solved for real or be promised to not happen.
  
  1 reply →

nevinainfotechs 1 month ago

[dead]