Comment by stavros

6 days ago

I love LLMs, and I really like programming with Cursor, but I never managed to get the "agents with tons of stuff in their context" mode to work for me. I use Cursor like a glorified code completer, 4-5 lines at a time, because otherwise the LLM just makes too many mistakes that compound.

If you let it run in the "write my code for me" mode, and ask it to fix some mistake it made, it will always add more code, never remove any. In my experience, in the end the code just ends up so brittle that the LLM will soon get stuck at a point that it never manages to overcome some mistake, no matter how many times it tries.

Has anyone managed to solve this?

8 comments

stavros

jks 6 days ago

Not a full solution, but one thing I've learned not to do is tell Cursor "you got that wrong, fix it like this". Instead, I go back to the previous prompt and click "Restore Checkpoint", edit the prompt and possibly the Cursor rules to steer it in the right direction.

When the model has the wrong solution in its context, it will use it when generating new code, and my feeling is that it doesn't handle the idea of "negative example" very well. Instead, delete the bad code and give it positive examples of the right approach.

lubujackson 5 days ago

Make sure you include lots of files of context (including any type definitions!) After a big change before approving any code ask it: "Is this the simplest and cleanest approach?" Usually it will make more revisions and actually produce clean code then. You can also say that in the original prompt, or specify the structure of the change you want it to make.

stavros 5 days ago

Oh nice, I'll try the simple thing, thanks!

csomar 6 days ago

No. And I don't think they are doing anything magical. Performance drop sharply after 50k tokens. Your LLM does best when you have a short 2-5K context.

mikeshi42 6 days ago

imo cursor's small context window on the $20 plan is what kills their agent mode. Try claude code or anything that lets you use larger context windows (I think cursor has a tier you can pay for now?). claude code regularly deletes and corrects code and LLMs are very much capable of that today.

__grob 6 days ago

I would love to know as well. I also have problems with the LLM coding agents when it comes to lesser-known languages like Julia. Has anybody solved this yet?

sv0t 6 days ago

in the last two weeks I've started using Cursor in the 'you do everything, I'm just your boss mode', to see how far I can push it. Right at this minute, I'm working on something thats become pretty big but I'm constantly on the verge of just going back to writing code like normal LOL.

That said, I'm super impressed by how quickly I've been able to get this far with something pretty niche and complex in places.

Here's what I've learned. There are million AI Bros on youtube who have the ultimate solution but they all boil down to a few basic things.

Make rules: make them before you get started and continue updating them as you go.

Lots of tiny modules: Push things into small bite sized bits of code with lots of interface documentation. This feels a bit unnatural when the code is prototype quality.

Documentation is key: the youtubers will often create detailed planning and specification documents in advance. I done this and it's hit and miss, what I've found works is explain what you want to build to an llm and have it create extremely concise documentation, then a rough checklist for implementation and then evolving these documents in cursor as I go.

This leads to a kind of plan --> document, implement --> document, run tests --> document workflow on each reasonable chunk of the design being the most smooth.

Don't let cursor ever see inside big datasets or huge folder trees. In fact keep cursor in a few folders writing source code and nothing else. To do this, early on build tools that can go and get information or make safe edits for cursor from those datasets without it attempting direct access.

The current project has a tools for working with the primary datasets, a job manager, a process manager, a log manager, a workflow manager, these all have functions for querying. Cursor is instructed to use these. It naturally doesn't want to but if you tell it 7 times out 10 it will :)

No mess: Cursor will like to make lots of random tests and processes along the way. instruct it to use a /debug/tests/ folder and wipe it clean often. force it to make 'production' code by having it be registered with the workflow manager and made allowable by process manager. This lets it play around for a while and get something up and running with it's weird little scripts and then implement it for real elsewhere and using the proper framework. The workflow manager needs to have documentation on how the script is used and the process manager needs to the source to be in a particular place with a particular set of standard interfaces.

you might say this is a lot of plumbing, but what isn't these days and it's not like I'm maintaining it - right ;)

Cursor is always about to go insane any minute, or can't remember what's going on. So most of it is about keeping things simple enough for it to focus on the bit it's working on and nothing else.

Regularly for work we have massive code bases written in our company style and built using our business processes. This kind of thing is absolutely not the place for Cursor. I can imagine a company setup from scratch to use Cursor but our current products and practices woud just make it impossibly time-consuming.