Comment by Quothling

1 month ago

AI is pretty bad at Python and Go as well. It depends a lot on who uses it though. We have a lot of non-developers who make things work with Python. A lot of it will never need a developer because it being bad doesn't matter for what it does. Some of it needs to be basically rewritten from scratch.

Over all I think it's fine.

I do love AI for writing yaml and bicep. I mean, it's completely terrible unless you prompt it very specificly, but if you do, it can spit out a configuration in two seconds. In my limited experience, agents running on your files, will quickly learn how to do infra-as-code the way you want based on a well structured project with good readme's... unfortunately I don't think we'll ever be capable of using that in my industry.

31 comments

Quothling

kelvinjps10 1 month ago

If it's bad at python the most popular language what language it's good at? If you see the other comments they're basically mentioning most programming languages

MarkMarine 1 month ago

Pretty good at Java, the verbose language, strong type system, and strong static analysis tools that you can run on every edit combine to keep it on the tracks you define
Quothling 1 month ago

Maybe I should have made it more clear, but it's pretty good if you know how to work with it. The issue is that it's usually faster to just read the documentation and write the code yourself. Depending on what you're working on of course. Like with the yaml, a LLM can write you an ingress config in a second or two from a very short prompt. It can do similar things with Python if you specify exactly how you want something and what dependencies you want.
That's being bad at programming in my opinion. You can mitigate it a lot with how you config you agents. Mine loads our tech stack. The best practices we've decided to use. The fact that I value safety first but am otherwise a fan of the YAGNI philosophy and so on. I spent a little time and build these things into my personal agent on our enterprise AI plan, and I use it a lot. I still have to watch it like a hawk, but I do think it's a great tool.
I guess you could say that your standard LLM will write better Python than I did 10 years ago, but that's not really good enough when you work on systems which can't fail. It's fine on 90% (I made this number up) of software though.
jcater 1 month ago
But that was a huge assertion in itself. I’m personally having amazing results with Python in Opus 4.5, so this is very contextual.
- conductr 1 month ago
  
  Agree. It’s excellent at python all round. If it lays out things how you want it to is a matter of preference and usually requires prompting it to restructure. That’s the standard way you work with AI code gen though, it’s iterative and requires testing. If you do it well it can be specified up front as a style guide set of instructions
accrual 1 month ago

I've had good results with TypeScript. I use a tested project template + .md files as well as ESLint + Stylelint and each project generally turns out pretty clean.
smackeyacky 1 month ago

One thing copilot seems to be good at for me is python. Other, older languages like VB.NET I found it struggled with.
I did find (weirdly) that it improved when running on WSL rather than windows.
However I did get it to code a script for downloading SharePoint files and even got it to reduce the dependencies down to built-ins which was a massive time saver
maxsilver 1 month ago

It's kinda okay at JS + React + Tailwind. (at least, for reasonably small / not-crazy-complex projects)
rerdavies 1 month ago

Pretty darned good at C++ and typescript too.
pezgrande 1 month ago
Well, OP bar seems super high. Because it isn't entirely perfect in order to allow a non-dev to create apps that doesn't make them "pretty bad" imo.
- Quothling 1 month ago
  
  It's terrible. The biggest issue is dependencies, but we've solved it by whitelisting what they are sllowed to use in the pipelines along with writing the necessary howtos.
  The thing I should have made clearer is probably that I think the horrible code is great. Yes it's bad, but it's also a ton of services and automation which would not have been made before LLM's, because there wouldn't have been enough developer time for it. Now it being terrible code doesn't mean the sollution itself is terrible for the business. You don't need software engineering until you do, and compute is really cheap on this scale. What do we care their code runs up €5 a year if it adds thousands of euros worth of value?
  It's only when something stops working. Usually because what started out as a small thing grows into something where it can't scale that we take over.

benjiro 1 month ago

> AI is pretty bad at Python and Go as well.

It great in Golang IF its one shot tasks. LLMs seem to degrade a lot when they are forced to work on existing code bases (even their own). What seems to be more a issue with context sizes growing out of control way too fast (and this is what degrades LLMs the most).

So far Opus 4.5 has been the one LLM that keeps mostly coding in a, how to say, predictable way even with a existing code base. It requires scaffolding and being very clear with your coding requests. But not like the older models where they go off script way too much or rewrite code in their own style.

For me Opus 4.5 has reached that sweet spot of productivity and not just playing around with LLMs and undoing mistakes.

The problem with LLMs is a lot of times a mix of LLM issues, people giving different requests, context overload, different models doing better with different languages, the amount of data it needs to alter etc... This makes the results very mixed from one person to another, and harder to quantify.

Even the different in a task makes the difference between a person one day glorifying a LLM and a few weeks later complaining it was nerfed, when it was not. Just people doing different work / different prompts and ...

OhSoHumble 1 month ago
> So far Opus 4.5 has been the one LLM that keeps mostly coding in a, how to say, predictable way even with a existing code base.
I find this to be true only if you have very explicit rules in CLAUDE.md and even then it still messes up.
I have "you will use the shared code <here>" twice in my CLAUDE.md as it will constantly write duplicate code.
Something that is also annoying is that if it moves some code somewhere with the intent to slightly modify it I've seen it delete the code, then implement from scratch, and then modify it to what it has been specified to do. This completely breaks tests. I then have to say "look at this earlier commit - you've caused a complete regression."
- dexdal 1 month ago
  
  This is a workflow boundary problem showing up as a tool problem. When changes aren’t constrained by explicit inputs and checkpoints, models optimise locally and regress globally. Predictability comes from the workflow, not the model.

mholm 1 month ago

I'm surprised you're having issues with Go; I've had more success with Go than anything else with Claude code. Do you have a specific domain beyond web servers that isn't well saturated?

TZubiri 1 month ago

Cgpt is built on python (training and finetuning priority), and uses it as a tool call.

Python is as good as output language as you are going to get.

genghisjahn 1 month ago

I’ve found claide code to be amazing at go. This is all nuts because experiences it’s so different from person to another.

fzzzy 1 month ago
It makes sense though, because the output is so chaotic that it's incredibly sensitive to the initial conditions. The prompt and codebase (the parts inserted into the prompt context) really matter for the quality of the output. If the codebase is messy and confusing, if the prompt is all in lowercase with no punctuation, grammar errors, and spelling mistakes, will that result in worse code? It seems extremely likely to me that the answer is yes. That's just how these things work. If there's bad code already, it biases it to complete more bad code.
- joquarky 1 month ago
  
  I've noticed that when I get tired, the quality of the output drops.
  I realized this happens because I'm not as precise with my prompts when I get tired.

glhaynes 1 month ago

I'm not a Python programmer but I could've sworn I've repeatedly heard it said that LLMs are particularly good at writing Python.

chasd00 1 month ago

Python is very versatile so it's probably a case of the dev not preferring the Python the model produced vs their own. I bet a lot of GenAI created C falls into the same bucket. "..well that's not how i would have done it.."

BrouteMinou 1 month ago

with all those languages listed in this thread,it explains why I don't trust or use AI when I code.

That's basically all the languages that I am using...

For the AI fans in here, what languages are you using? Typescript only would be my guess?

yojo 1 month ago

I use it in a Python/TS codebase (series D B2B SaaS with some AI agent features). It can usually “make it work” in one shot, but the code often requires cleanup.
I start every new feature w/Claude Code in plan mode. I give it the first step, point it to relevant source files, and tell it to generate a plan. I go catch up on my Slack messages.
I check back in and iterate on the plan until I’m happy, then tell it to implement.
I go to a team meeting.
I come back and review all the code. Anything I don’t 100% understand I ask Gemini to explain. I cross-check with primary sources if it’s important.
I tweak the generated code by hand (faster than talking with the agent), then switch back to plan mode and ask for specific tests. I almost always need to clean up the tests for doing way too much manual setup, despite a lot of Claude.md instructions to the contrary.
In the end, I probably get the work done in 30% less wall-clock time of Claude implementing (counting plan time), but I’m also doing other things while the agent crunches. Maybe 50% speed boost in total productivity? I also learn something new on about a third of features, which is way more than I did before.
madeofpalk 1 month ago

> why I don't trust or use AI when I code
These are two different concepts. I use AI when coding, but I don't trust it. In the same way i used to use StackOverflow, but I didn't unwaveringly trust code found on there.
I still need to test and make sure the code does the thing I wanted it to do.
brandonmb 1 month ago

I’ve found it to be quite good at Python, JS (Next + Tailwind + TS type of things), and PHP. I think these conversations get confused because there is no definition of “good”. So I’m defining “good” as it can do 50-80% of the work for me, even in a giant code base where call sites are scattered and ever changing. I still have to do some clean up or ask it to do something different, but many times I don’t need to do anything.
As someone else mentions, the best working mode is to think through your problem, write some instructions, and let it do it’s thing while you do other work. Then come back and treat that as a starting point.
abraae 1 month ago

I find both chatgpt and Gemini to be very good at writing c++ for Arduino/esp32. Certainly better than me unassisted. Compile errors are very rare, and usually they are just missing declarations. Right now I would say chatgpt is ahead for daily driver use but sometimes Gemini can instantly unlock things that chatgpt is stuck on.
rubyfan 1 month ago
Yeah that list has left me wondering, then what is it good at? HTML, CSS and JavaScript?
- cies 1 month ago
  
  SQL. I learned a lot using it. It's really good and uses teh full potential of Postgres. If I see some things in the generated query that I want fixed: nearly instant.
  Also: it gives great feedback on my schema designs.
  So far SQL it's best. (comparing to JS/ HTML+Tailwind / Kotlin)
- aschobel 1 month ago
  
  It’s been amazing for me for Go and TypeScript; and pretty decent at Swift.
  There is a steep learning curve. It requires good soft eng practices; have a clear plan and be sure have good docs and examples. Don’t give it an empty directory; have a scaffolding it can latch onto.
  
  1 reply →

OhSoHumble 1 month ago

> AI is pretty bad at Python and Go as well

I disagree with this. At least for Go.