Comment by superze

2 months ago

As an Opus user, I genuinely don’t understand how someone can work for weeks or months without regularly opening an IDE. The output almost always fails.

I repeatedly rewrite prompts, restate the same constraints, and write detailed acceptance criteria, yet still end up with broken or non-functional code.its very frustrating to say the least Yesterday alone I spent about $200 on generations that now require significant manual rewrites just to make them work.

At that point, the gains are questionable. My biggest success is having the model take over the first Design in my app and I take it from there, but those hundred lines if not thousand lines of code it generates are so Messi, it's insanely painful to refactor the mess afterwards

41 comments

superze

throwatdem12311 2 months ago

I have a hell of a time just getting any LLM to write SQL queries that have things like window functions, aggregates and lateral left joins - even when shoving the entire database schema DDL into the context.

It's so frustrating, it regularly makes me want to just quit the profession. Which is why I still just write most code by hand.

data-ottawa 2 months ago
I write a lot of SQL and I haven't had these issues for months, even with smaller models. Opus can one shot most of my queries faster than I could type them.
Instead of stuffing the context with DDL I suggest:
1. Reorganize your data warehouse. It needs to be easy to find the correct data. Make sure you use ELT clear layers, meaningful schemas, and have per-model documentation. This is a ton of work, but if done right the payoff is massive.
2. I built a tool for myself to pull our warehouse into a graph for fuzzy search+dependency chain analysis. In the spring I made an MCP server for it and Claude uses that tool incredibly well for almost all queries. I haven't actually used the GUI or scripts since I built the MCP.
Claude and Devstral are the best models I've used for SQL. I cannot get Gemini to write decent modern sql -- even the Gemini data science/engineer agents in Google Cloud. I occasionally try the paid models through the API and still haven't been impressed.
- enraged_camel 2 months ago
  
  >> I write a lot of SQL and I haven't had these issues for months, even with smaller models. Opus can one shot most of my queries faster than I could type them.
  Same. SOTA models crush every SQL question I give them.
  
  5 replies →
deadbabe 2 months ago
If you really know SQL, writing an SQL query basically just feels like writing a prompt for a database client anyway, except it does exactly what you ask for.
- throwatdem12311 2 months ago
  
  I have a running joke at work.
  * LLMs are just matrix multiplication. * SQL is just algebra, which has matrix multiplication as part of it. * Therefore SQL is AI * Now who is ready to invest a billion dollars in our AI SaaS company?
  Or it’s just that astronaut with a gun meme: “Wait AI is just SQL?….Alway has been.”

SkyPuncher 2 months ago

My trick is to explicitly roll play that we’re doing a spike. This gets all of the models to ignore all of the details they normally get hung up on. Once I have the basics in place, I can tell it to fix details.

It’s _always_ easier to add more code than it is to fix broken code.

nowittyusername 2 months ago

Most people have not fully grasped how LLM's work and how to properly utilize agentic coding solutions. That is the reason for issues when it comes to vibe coders having low quality code. But that is not the limitation of technology but the user (at this stage). Basically think of it this way everyone is the grandma that has been handed a palm pilot to use to get things done. Grandma needs an iPhone not a palm pilot but the problem is that we are not in that territory yet. So now consider the people who were able to use the palm pilot very successfully and well, they were few and they were the exception, but they existed. Same here. I have been using coding agent for over 7 months now and have written zero lines of code, in fact I don't know how to code at all. But i have been able to architect very complex software projects from scratch. Text to speech , automated llm benchmarking systems for testing all possible llama.cpp sampling parameters and more, and now im building my own agentic framework from scratch. All of these things are possible and more without writing one line of code yourself. But it does require understanding how to use the technology well to get this done.

mirsadm 2 months ago
If you don't know how to code then you are not able to judge what your producing accurately.
- nowittyusername 2 months ago
  
  here you go I open sourced one of the projects https://youtu.be/EyE5BrUut2o
krior 2 months ago
All of the applications you mention could be scoped as beginner projects. I don't think they represent good proofs of capability.
- nowittyusername 2 months ago
  
  Well why don't you look at it for yourself and tell me if this looks like a beginner project https://youtu.be/EyE5BrUut2o
  
  2 replies →

shepherdjerred 2 months ago

I hardly ever open an IDE anymore.

I use Claude Code and Cursor. What I do:

- use statically typed languages: TypeScript, Go, Rust, Python w/ types

- Setup linters. For TS I have a bunch of custom lint rules (authored by AI) for common feedback that I've given. (https://github.com/shepherdjerred/monorepo/tree/main/package...)

- For Cursor, lots of feedback on my desired style. https://github.com/shepherdjerred/scout-for-lol/tree/main/.c...

- Heavy usage of plan mode. Tell AI something like "make at least 20 searches to online documentation", support every claim with a reference, etc. Tell AI "make a task for every little thing you'll implement"

- Have the AI write tests, particularly the more expensive ones like integration and end-to-end, so you have an easy way to verify functionality.

- Setup Claude Code GHA to automatically review PRs. Give the review feedback to the agent that implemented it, either via copy-pasting or tell the agent "fetch review comments and fix them".

Some examples of what I've made:

- Many features for https://scout-for-lol.com/, a League of Legends bot for Discord

- A program to generate TypeScript types for Helm charts (https://github.com/shepherdjerred/homelab/tree/main/src/helm...)

- A program to summarize all of the dependency updates for my Homelab (https://github.com/shepherdjerred/homelab/tree/main/src/deps...)

- A program to manage multiple instances of CLI agents like Claude Code (https://github.com/shepherdjerred/monorepo/tree/main/package...)

- A Discord AI bot in the style of my friends (https://github.com/shepherdjerred/monorepo/tree/main/package...)

moffkalast 2 months ago
> make at least 20 searches to online documentation
Lol sometimes I have to spend two turns convincing Claude to use its goddamn search and look up the damn doc instead of trying to shoot from the hip for the fifth time. ChatGPT at least has forced search mode.
- shepherdjerred 2 months ago
  
  I've found that telling it to specifically do N searches works consistently. I do really wish Claude Code had a "deep research" mode similar to 'normal' Claude.
throw2312321 2 months ago
Thanks for sharing. So the dumb question - do you feel like Claude Code & Cursor have made you significantly more productive? You have an impressive list of personal projects, and I can see how a power user of AI tools can be very effective with green field projects. Does the productivity boost translate as well to your day job?
- shepherdjerred 2 months ago
  
  For personal projects, I have found it to be transformative. I've always struggled with perfection and doing the "boring parts". AI has allowed me to add lots of little nice-to-have features and focus less on the code.
  I'm lucky enough that my workplace also uses Cursor + Claude Code, so my experience directly transfers. I most often use Cursor for day-to-day work. Claude has been great as a research assistant when analyzing how data flows between multiple repos. As an example I'm writing a design doc for a new feature and Claude has been helping me with the investigation. My workflow is more or less to say: "here are my repos, here is the DB schema, here are previous design docs, now how does system X work, what would happen if I did Y, etc."
  AI is still fallible so you _do_ of course have to do lots of checking and validation which can be boring, but much easier if you add a prompt like "support every claim you make with a concrete reference".
  When it comes to implementation, I generally give it smaller, more concrete pieces to work with. e.g. for a personal project I would say something like "here is everything I want to do, make a plan, do part 1, then do part 2, example: https://github.com/shepherdjerred/scout-for-lol/tree/227e784...)
  At work, I tend to give it PR-sized units of work. e.g. something very well-scoped and defined. My workflow is: prompt, make a PR on GitHub, add comments on GitHub, tell Cursor "I left comments on your PR, address them", repeat. Essentially I treat AI as a coworker submitting code to me.
  I don't really know that I can quantify the productive gain.. I can say that I am _much_ more motivated in the last few months because AI removes so much friction. I think it's backed up by my commit history since June/July which is when I started using Cursor heavily: https://github.com/shepherdjerred
BhavdeepSethi 2 months ago
Cursor is an IDE.
- shepherdjerred 2 months ago
  
  Oh to clarify I used to use Cursor but the last month or two I've used Claude Code almost exclusively. Mostly because it seems to be more generous with credits.

miguel_martin 2 months ago

This is what an AGENTS.md - https://agents.md/ (or CLAUDE.md) file is for. Put common constraints to correct model mistakes/issues with respect to the codebase, e.g. in a “code style” section.

tmaly 2 months ago

What does your software creation workflow look like? Do you have a design phase?

falcor84 2 months ago

Why would you spend $200 a day on Opus if you can pay that for a month via the highest tier Claude Max subscription? Are you using the API in some special way?

jefffoster 2 months ago
At a guess an Enterprise API account. Pay per token but no limits.
It’s very easy to spend $100s per dev per day.
- simonw 2 months ago
  
  The $200/month plan doesn't have limits either - they have an overage fee you can pay now in Claude Code so once you've expended your rate limited token allowance you can keep on working and pay for the extra tokens out of an additional cash reserve you've set up.
  
  5 replies →
- falcor84 2 months ago
  
  Oh, I wasn't arguing that it isn't "easy to spend $100s per dev per day". I was just asking what the use-case for that is.

christophilus 2 months ago

I’ve had decent results from it. What programming language are you using?

cloudflare728 2 months ago

Sometimes I have a similar file or related files. I copy their names and say use them as reference. Code quality improves by 10 times if you do so. Even providing a a example from framework's getting started works great too for new project.

Yeah the pain of cleaning up small mess is great too. I had some tests failing and type failing issues, I thought I will fix it later by only using AI prompt. As the size was growing, failing Typescript issues was growing too. At some point it was 5000+ type issues and countless number of failing unit tests. Then more and more. I tried to fix with AI, since it was not possible fixing old way. Then I discarded the whole project when it was around 500k lines of code.

pca006132 2 months ago
Question: How many LoC do you let the AI write for each iteration? And do you review that? It sounds like you are letting it run off leash.
- cloudflare728 2 months ago
  
  I had no idea how it would end up. It was first time using AI IDE. I had only used chatgpt.com and claude.ai for small changes before. I continued it for the experiment. I thought AI write too many tests, I will judge based on test passing. I agree, it was bad expectation + no experience with AI IDE + bad software engineering.