Comment by jofer

1 day ago

I appreciate this writeup. I live in the terminal and work primarily in vim, so I always appreciate folks talking about tooling from that perspective. Little of the article is that, but it's still interesting to see the workflow outlined here, and it gives me a few ideas to try more of.

However, I disagree that LLMs are anywhere near as good as what's described here for most things I've worked with.

So far, I'm pretty impressed with Cursor as a toy. It's not a usable tool for me, though. I haven't used Claude a ton, though I've seen co-workers use it quite a bit. Maybe I'm just not embracing the full "vibe coding" thing enough and not allowing AI agents to fully run wild.

I will concede that Claude and Cursor have gotten quite good at frontend web development generation. I don't doubt that there are a lot of tasks where they make sense.

However, I still have yet to see a _single_ example of any of these tools working for my domain. Every single case, even when the folks who are trumpeting the tools internally run the prompting/etc, results in catastrophic failure.

The ones people trumpet internally are cases where folks can't be bothered to learn the libraries they're working with.

The real issue is that people who aren't deeply familiar with the domain don't notice the problems with the changes LLMs make. They _seem_ reasonable. Essentially by definition.

Despite this, we are being nearly forced to use AI tooling on critical production scientific computing code. I have been told I should never be editing code directly and been told I must use AI tooling by various higher level execs and managers. Doing so is 10x to 100x slower than making changes directly. I don't have boilerplate. I do care about knowing what things do because I need to communicate that to customers and predict how changes to parameters will affect output.

I keep hearing things described as an "overactive intern", but I've never seen an intern this bad, and I've seen a _lot_ of interns. Interns don't make 1000 line changes that wreck core parts of the codebase despite being told to leave that part alone. Interns are willing to validate the underlying mathematical approximations to the physics and are capable of accurately reasoning about how different approximations will affect the output. Interns understand what the result of the pipeline will be used for and can communicate that in simple terms or more complex terms to customers. (You'd think this is what LLMs would be good at, but holy crap do they hallucinate when working with scientific terminology and jargon.)

Interns have PhDs (or in some cases, are still in grad school, but close to completion). They just don't have much software engineering experience yet. Maybe that's the ideal customer base for some of these LLM/AI code generation strategies, but those tools seem especially bad in the scientific computing domain.

My bottleneck isn't how fast I can type. My bottleneck is explaining to a customer how our data processing will affect their analysis.

(To our CEO) - Stop forcing us to use the wrong tools for our jobs.

(To the rest of the world) - Maybe I'm wrong and just being a luddite, but I haven't seem results that live up to the hype yet, especially within the scientific computing world.

6 comments

jofer

smithkl42 1 day ago

This is roughly my experience with LLMs. I've had a lot of friends that have had good experience vibe coding very small new apps. And occasionally I've had AI speed things up for me when adding a specific feature to our main app. But at roughly 2 million lines of code, and with 10 years of accumulated tribal knowledge, LLMs really seem to struggle with our current codebase.

The last task I tried to get an LLM to do was a fairly straightforward refactor of some of our C# web controllers - just adding a CancellationToken to the controller method signature whenever the underlying services could accept one. It struggled so badly with that task that I eventually gave up and just did it by hand.

The widely cited study that shows LLMs slow things down by 20% or so very much coheres with my experience, which is generally: fight with the LLM, give up, do it by hand.

zanellato19 1 day ago
My experience is that sometimes they give you a 10x speedup but then you hit a wall and take 30 times longer to do a simple thing and a lot of people just keep hammering because of the first feeling. Outside of boilerplate, I haven't seen it be this magical tool people keep claiming it is.
- SeriousM 10 hours ago
  
  That's the definition of an advanced scaffolding tool. And yes, I subscribe to that. From time to time I use Gemini CLI for little tools I have no time to read all the docs of thinkgs I'm not used to, but in the end I need to make flow changes and be forced to understand the generated code. x10 faster bootstrap, x30 slower manual changes, 100% my codebase problem.

zmmmmm 13 hours ago

> I have been told I should never be editing code directly and been told I must use AI tooling by various higher level execs and managers

Wow, this is really extreme. We certainly got to this point faster than I expected.

jofer 12 hours ago

To be fair, it's the higher level folks who are too far removed from things to have any actual authority. I've never heard a direct single-team engineering manager something like that. But yeah, CEOs say crazy crap. And we're definitely there, though to be fair, his exact quote was "I insist everyone try to have AI generate your code first before you try making any direct changes". It's not _quite_ as bad as what I described. But then the middle management buys in and says similar things. And we now have a company level OKR around having 80% of software engineers relying on AI tooling. It's a silly thing to dictate.

akra 11 hours ago

In my view its a tool, at least for the moment. Learn it, work out how it works for you, and what it doesn't work for you. But assuming you are the professional they should trust your judgement, and you should also earn that trust. That's why you pay skilled people for. If that tool isn't the best to getting the job done use something else. Of course that professional should be evaluating tools and assuring us/management (whether by evidence or other means) that the most cost efficient and quality product is being built like any other profession.

I use AI, and for some things its great. But I'm feeling like they want us to use the "blunt instrument" that is AI when sometimes a smaller, more fine grained tool/just handcrafting code for accuracy at least for me is quicker and more appropriate. The autonomy window as I recently heard it expressed.