Comment by yallpendantools

16 days ago

Butting in here but as I have the same sentiment as monkaiju: I'm working on a legacy (I can't emphasize this enough) Java 8 app that's doing all sorts of weird things with class loaders and dynamic entities which, among others, is holding it in Java 8. It has over ten years of development cruft all over it, code coverage of maybe 30-40% depending on when you measure it in the 6+ years I've been working with it.

This shit was legacy when I was a wee new hire.

Github Copilot has been great in getting that code coverage up marginally but ass otherwise. I could write you a litany of my grievances with it but the main one is how it keeps inventing methods when writing feature code. For example, in a given context, it might suggest `customer.getDeliveryAddress()` when it should be `customer.getOrderInfo().getDeliveryInfo().getDeliveryAddress()`. It's basically a dice roll if it will remember this the next time I need a delivery address (but perhaps no surprises there). I noticed if I needed a different address in the interim (like a billing address), it's more likely to get confused between getting a delivery address and a billing address. Sometimes it would even think the address is in the request arguments (so it would suggest something like `req.getParam('deliveryAddress')`) and this happens even when the request is properly typed!

I can't believe I'm saying this but IntelliSense is loads better at completing my code for me as I don't have to backtrack what it generated to correct it. I could type `CustomerAddress deliveryAddress = customer` let it hang there for a while and in a couple of seconds it would suggest to `.getOrderInfo()` and then `.getDeliveryInfo()` until we get to `.getDeliveryAddress()`. And it would get the right suggestions if I name the variable `billingAddress` too.

"Of course you have to provide it with the correct context/just use a larger context window" If I knew the exact context Copilot would need to generate working code, that eliminates more than half of what I need an AI copilot in this project for. Also if I have to add more than three or four class files as context for a given prompt, that's not really more convenient than figuring it out by myself.

Our AI guy recently suggested a tool that would take in the whole repository as context. Kind of like sourcebot---maybe it was sourcebot(?)---but the exact name escapes me atm. Because it failed. Either there were still too many tokens to process or, more likely, the project was too complex for it still. The thing with this project is although it's a monorepo, it still relies on a whole fleet of external services and libraries to do some things. Some of these services we have the source code for but most not so even in the best case "hunting for files to add in the context window" just becomes "hunting for repos to add in the context window". Scaling!

As an aside, I tried to greenfield some apps with LLMs. I asked Codex to develop a minimal single-page app for a simple internal lookup tool. I emphasized minimalism and code clarity in my prompt. I told it not to use external libraries and rely on standard web APIs.

What it spewed forth is the most polished single-page internal tool I have ever seen. It is, frankly, impressive. But it only managed to do so because it basically spat out the most common Bootstrap classes and recreated the W3Schools AJAX tutorial and put it all in one HTML file. I have no words and I don't know if I must scream. It would be interesting to see how token costs evolve over time for a 100% vibe-coded project.

4 comments

yallpendantools

stackbutterflow 16 days ago

Copilot is notoriously bad. Have you tried (paid plans) codex, Claude or even Gemini on your legacy project? That's the bare minimum before debating the usefulness of AI tools.

yallpendantools 16 days ago
> Copilot is notoriously bad.
"notoriously bad" is news to me. I find no indication from online sources that would warrant the label "notoriously bad".
https://arxiv.org/html/2409.19922v1#S6 from 2024 concludes it has the highest success rate in easy and medium coding problems (with no clear winner for hard) and that it produces "slightly better runtime performance overall".
https://research.aimultiple.com/ai-coding-benchmark/ from 2025 has Copilot in a three-way tie for third above Gemini.
> Have you tried (paid plans) codex, Claude or even Gemini on your legacy project?
This is usually the part of the pitch where you tell me why I should even bother especially as one would require me to fork up cash upfront. Why will they succeed where Copilot has failed? I'm not asking anyone to do my homework for me on a legacy codebase that, in this conversation, only I can access---that's outright unfair. I'm just asking for a heuristic, a sign, that the grass might indeed be greener on that side. How could they (probably) improve my life? And no, "so that you pass the bare minimum to debate the usefulness of AI tools" is not the reason because, frankly, the less of these discussions I have, the better.
- stackbutterflow 16 days ago
  
  I'm saying this to help you. Whether you give it a shot makes no difference to me. This topic is being discussed endlessly everyday on all major platforms and for the past year or so the consensus is strongly against using copilot.
  If you want to see if your project and your work can benefit from AI you must use codex, Claude code or Gemini (which wasn't a contender until recently).
  
  1 reply →