← Back to context

Comment by CopyOnWrite

2 months ago

Most comments here surprise me: I am using Githubs Copilot / ChatGPT 4.0 at work with a code base which is mostly implements a basic CRUD service... and outside of small/trivial example (where the generated code is mostly okay), prompting is more often than not a total waste of time. Now, I wonder if I am just totally unable to write/refine good prompts for the LLM (as it works for smaller samples, I hope I am not too far off) or what could explain the huge discrepancy of experience. (Just for the record: I would totally not mind if the LLM writes the code for the stuff I have to do at work.)

To clarify my questions: - Who here uses LLMs to generate code for bigger projects at work? (>= 20k lines of code) - If you use LLMs for bigger projects: Do you need to change your prompting strategy to get good results? - What programming languages are you using in your code bases? - Are there other people here who experience that LLMs are no help for non trivial problems?

I'm in the same boat. I've largely stopped using these tools other than asking questions about a language that I'm less familiar with or a complex type in typescript for which it can be helpful (sometimes). Otherwise, I felt like I was just wasting my time and becoming lazier/worse as a developer. I do wonder whether LLMs have hit a wall and we're in a hype cycle.

  • Yes, I have the same feeling about the wall/hype cycle. Most of my time is understanding code and formulating a plan to change code w/o breaking anything... even if LLMs would generate 100% perfect code on the first try, it would not help in a big way.

    One thing I forgot to mention is asking LLMs questions from within the IDE instead of doing a web search... this works quite nice, but again, it is not a crazy productivity boost.

My employer gives me access to Jetbrains AI, I work on a Vue Frontend with a Kotlin Spring Boot backend.

The codebase is not too old and has grown without too much technical debt, with complex prompts I never had decent success. Its usefull for quick "what does this do" checks but any real functionality seems to be lacking.

Maybe I'm not refining my prompts good enough but doing so would take longer than implementing it myself.

Recently I tried Jetbrains Junie, which acts like Claude if I understand it correctly.

I had a really refined prompt, ran it three times with adjustments and fine tuning but the result was still lacking. So I tossed it and wrote it myself. But watching the machine nearly getting it right was still impressive.

  • Jetbrains AI runs on a "discount LLM" and their ratings were below 2 stars. I tried two others, which played games with me to reduce context and use cheaper models. I then switched to Aider which leads me to believe a moderate Claude user may need to spend 30$ a month, but I use Gemini models and I didnt exceed 5$.

You are just bad with prompting or working with very obscure language/framework or bad coding pattern or all of it. I had a talk with a seasoned engineer who has been coding for 50 years and has created many amazing things over lifetime about him having really bad results with AI tools I suggested for him. When I use AI for the same purposes in the same repo he's working on, it works nicely. When he does it, results are always not what he wants. It comes down to a combination of him not understanding how to guide the LLMs to correct direction and using a language/framework (he's not familiar with) he can't judge the LLMs output. It is really important to know what you want, be able to describe it in short points (but important points). Points that you know ai will mess up if you don't specify. And also be able to figure out which direction the ai is heading with the solution and correct it EARLY rather than later. Not overloading context/memory with unnecessary things. Focusing on key areas to improve and much more. I'm using AI to get solutions done that I can definitely do myself but it'll take a certain amount of time to hunt down all documentation, API/lib calls etc. With AI, 1/10th time is enough.

I've had massive success with java, js/TS, html css, go, rust, python, bitbucket pipelines/GitHub actions, cdk, docker compose, SQL, flutter/dart, swift etc.

  • I've had the same experience as the person to whom you're responding. After reading your post, I have to ask: if you're putting so much effort into prompting it with specific points, correcting it often, etc., why not just write the code yourself? It sounds like you're putting a good deal of effort into prompting it.

    Aren't you worried that overtime you'll rely on it too much and your offhand knowledge will get worse?

    • I'm still spending less effort/time. A very significant amount.

      I do write plenty of things myself. Sometimes, I ignore AI completely and write 100s of lines. Sometimes, I take copilot suggestions every other line, as I'm writing something "common" and copilot can "read" my mind. And sometimes, I write 100s of lines purely by prompting. It is a fine line when to do which; also depends on mood.

      I am not worried about that as I spend hours everyday reading. I'm also the type of person who, when something is needed in a document, do not search for it using CTRL+F, but manually look thru it. It always takes more time but I also learn adjacent things to the topic I need.

      And I never commit a single line from AI without reading and understanding it myself. So it might come up with 100 line solution for me, but I probably already know what I wanted and off chance it came up with something correct but in a way I did not know, I do read and learn it.

      Ultimately, to me, the knowledge that I can !reset null in docker compose override is important. Remembering if it is !null reset or reset !null or !reset null (i.e., syntax) is not important. My offhand knowledge is not getting worse as I am constantly learning things; I just focus less on specific syntaxes or API signatures now.

      You can apply the same argument with IDE. Almost all developers will fail to write proper JS/TS/Java etc without IDE help.

    • I have read somewhere, that LLMs are mostly helpful to junior developers.

      Is it possible the person claiming success with all these languages/tools/technologies is just on a junior level and is subjectively correct but has no point of reference how fast coding is for seniors and how quality code looks like?

      1 reply →

    • Not OP, it be comes natural and doesn't take a lot of time.

      Anyway, if you want to, LLMs can today help with a ton of programming languages and frameworks. If you use any of the top 5 languages and it still doesn't work for you, either you're doing some esoteric work or you're doing it wrong.

      1 reply →

  • I do not rule out, that I am just very bad with prompting.

    It just surprises me, that you write you had massive successes with "java, js/TS, html css, go, rust, python, bitbucket pipelines/GitHub actions, cdk, docker compose, SQL, flutter/dart, swift etc.", if you include the usual libraries/frameworks and the diverse application areas for these technologies, even with LLMs support it seems to me crazy to be able to make meaningful contributions in non trivial code bases.

    Concerning SQL I can report another fail with LLMs, in a trivial code base with a handful of entities the LLM cannot come up with basic window functions.

    I would be very interested if you could write up a blog post or could make a youtube video demonstrating your prompting skills... Perhaps demonstrating with a bigger open source project in any of the mentioned languages how to add a non trivial feature with your prompting skills?

    • Unfortunately, I'm at a stage of personal life where I do not have time to blog. I'd love to but :(

      The stuff I work on for company is confidential and even getting authorization to use AI was such a hassle.

      Based on some of your replies, I think you have an impression of current generation AIs that is 100% wrong. I can not blame you as the impression you have, is what the AI companies want you to have, that's what they are hyping.

      In another comment, you mentioned someone should demo how AI can add a non-leaf feature to a non-trivial LOC codebase. This is what AI companies say AI can do. But the truth is, (current gen) AIs can not do this except a few rare cases. I can not demo this to you as I can't do this and do not attempt to do it either on day to day tasks.

      The issue is context. What you are asking requires AI to have a huge amount of context that it simply is not equipped to handle (at least not right now).

      What AIs are really good at is to do small fragment of a task given enough clear requirements.

      When I want AI to write a Handler in my Controller, I don't just ask it to "write a function to handle POST call for entity E."

      I write the javadoc /* */ comment that defines the signature and explains a little about how the core process of this handling works. I can even copy/paste similar handler from another controller if I think that will help.

      Ultimately, my brain already knows the input, output and key transformations that needs to happen in this function. I just write minimal amount (esp comments) and get AI to complete the rest.

      So if I need to write a non-leaf feature, I will break it down to several leaf features and then pass it on to AI and if needed, manually assemble them.

      I had to write 500LOC bash script to handle a software deployment. This is not the way to do it but I was forced by circumstances created by someone else. Anyways, if I had to write the whole thing by hand, it'd take multiple days as bash syntax is not forgiving and the stuff I needed to do in the script were quite complex/crazy (and stupid).

      I think I wrote about 50+ lines of text describing the whole process, which you can think of as a requirement document.

      With a few tries, I was able to get the whole script with near accuracy. My reading revealed some issues. Pointed them to AI. It fixed them. Tests revealed some other issues. Again, AI fixed them after pointing out. I was able to get the whole thing done in just an hour or so.

  • > You are just bad with prompting or working with very obscure language/framework or bad coding pattern or all of it

    You just described every existing legacy project^^

Play with Cursor or Claude Code a bit and then make a decision. I am not on the this is going to replace Devs boat, but this has changed the way I code and approach things.

  • Could you perhaps point me to a youtube video which demonstrates an experienced prompter sculpting code with Cursor/Clause Code?

    In my search I just found trivial examples.

    My critic so far:

    - Examples seem always to be creating a simple application from scratch

    - Examples always use super common things (like create a blog / simple website for CRUD)

    What I would love to see (see elsewhere): Adding a non trivial feature to a bigger code base. Just a youtube video/demonstration. I don't care about language/framework etc. ...

    • This morning I made this while sipping coffee, and it solves a real problem for my gf: https://github.com/kirubakaran/xmldiffer Sure it's not enormous, and it was built from scratch, but imho it's not a trivial thing either. It would've taken me at least a day or two of full time work, and I certainly don't have a couple of days to spare on a task like this. Instead, pair programming with AI made it into a fun relaxing activity.

      1 reply →

Copilot is just plain bad. The result is day and night compare with cursor + gemini 2.5 (of course with good prompting)

> Now, I wonder if I am just totally unable to write/refine good prompts for the LLM (as it works for smaller samples, I hope I am not too far off) or what could explain the huge discrepancy of experience.

Programming language / stack plays plays a big role, I presume.

  • Fair enough. Still, I was out of luck for some fairly simple SQL statements, were the model knows 100% of the DDL statements.

Same here. We have a massive codebase with large classes and the LLMs are not very helpful. Frontend stuff is okay sometimes but the backend models are too complex at this point, I guess.

Tooling and available context size matters a lot. I'm having decent luck with Gemini 2.5 and Roo Code.