Comment by javcasas
7 days ago
Good luck with that.
I have been exploring local AI tools for coding (ollama + aider) with a small stock market simulator (~200 lines of python).
First I tried making the AI extract the dataclasses representing events to a separated file. It decided to extract some extra classes, leave behind some others, and delete parts of the code.
Then I tried to make it explain one of the actors called LongVol_player_v1, around 15 lines of code. It successfully concluded it does options delta hedging, but it jumped to the conclusion that it calculates the implied volatility. I set it as a constant, because I'm simulating specific interactions between volatility players and option dealers. It hasn't caught yet the bug where the vol player buys 3000 options but accounts only for 2000.
When asking for improvements, it is obsessed with splitting the initialization and the execution.
So far I wasted half of Saturday trying to make the machine do simple refactors. Refactors I could do myself in half of an hour.
I'm yet to see the wonders of AI.
If you are using Ollama that suggests you are using local models - which ones?
My experience is that the hosted frontier models (o3, Gemini 2.5, Claude 4) would handle those problems with ease.
Local models that fit on a laptop are a lot less capable, sadly.
I have tried with qwen2.5-coder:3b, deepseek-coder:6.7b, deepseek-r1:8b, and llama3:latest.
All of them local, yes.
That explains your results. 3B and 8B models are tiny - it's remarkable when they produce code that's even vaguely usable, but it's a stretch to expect them to usefully perform an operation as complex as "extract the dataclasses representing events".
You might start to get useful results if you bump up to the 20B range - Mistral 3/3.1/3.2 Small or one of the ~20B range Gemma 3 models. Even those are way off the capabilities of the hosted frontier models though.
Could you link the repo and prompts? What you described seems like the type of thing I’ve done before with no issue so you may have an interesting code base that is presenting some issues for the LM.
I cannot post the link to the repo, as it contains sensitive stuff. The code is mostly a bunch of classes with interleaved dataclasses, and a bunch of main() and run() functions at the end.
Some of the logs:
-----------------------------------
Thanks for posting this. I learned that it's hard to really grok much w/o the full project. :-\
More logs:
For what it's worth, commercial models are in a completely different league to locally runnable models. If you are really interested in seeing state of the art right now at least give it a whack with opus/gemini/o3 or something of that calibre.
You might still be disappointed but at least you won't have shot your leg off out of the gates!
I'm trying to use local models for privacy reasons. Also, at some point, one of the employers out there will start suing people because those people shared code with a commercial model, and the commercial model decided to train on that code and output learned code to someone else. I'm not interested in anything of such a situation.