Comment by tuzemec

12 hours ago

I'm currently experimenting with running google/gemma-4-26b-a4b with lm studio (https://lmstudio.ai/) and Opencode on a M3 Ultra with 48Gb RAM. And it seems to be working. I had to increase the context size to 65536 so the prompts from Opencode would work, but no other problems so far.

I tried running the same on an M3 Max with less memory, but couldn't increase the context size enough to be useful with Opencode.

It's also easy to integrate it with Zed via ACP. For now it's mostly simple code review tasks and generating small front-end related code snippets.

9 comments

tuzemec

usagisushi 5 hours ago

I have a similar setup. It might be worth checking out pi-coding-agent [0].

The system prompt and tools have very little overhead (<2k tokens), making the prefill latency feel noticeably snappier compared to Opencode.

[0] https://www.npmjs.com/package/@mariozechner/pi-coding-agent#...

tuzemec 2 hours ago

Thanks! I just ran a quick test with pi, and it's working a bit faster.

rsolva 2 hours ago

I run this model on my AMD RX7900XTX with 24GB VRAM with up to 4 concurrent chats and 512K context window in total. It is very fast (~100 t/s) and feels instant and very capable, and I have used Claude Code less and less these days.

davidwritesbugs 3 hours ago

I did the same using the mlx version on an M1 Macbook using LMStudio integrated into XCode. I had to up the context size I ran it a against a very modest iOS codebase and it didn't do well, just petered out at one point. Odd. Pretty good chatbot and maybe against other code it'll work but not useful with XCode for me

ozgrakkurt 2 hours ago

Not sure if you already tried but both GLM Flash and Qwen models are much better than Gemma for that in my experience.

I am using a 24GB GPU so it might be different in your case, but I doubt it.

jwr 9 hours ago

I do the same thing on a MacBook Pro with an M4 Max and 64GB. I had problems until the most recent LM Studio update (0.4.11+1), tool calling didn't work correctly.

Now both codex and opencode seem to work.

declan_roberts 5 hours ago
Which do you prefer? And what lmstudio api works best for these tools?
- jwr 3 hours ago
  
  I use the OpenAI API for everything. I think codex is more polished, but I don't really prefer anything: I haven't used them enough. I mostly use Claude Code.

smrtinsert 4 hours ago

gguf or mlx? edit, just tried a community mlx and lm studio said it didn't support loading it yet.