Comment by ljosifov

2 months ago

Nah - given the ergonomics + economics, local coding models are not atm that viable. I like all things local even if just for safety of keeping healthy competitive ecosystem. And I can imagine really specialised uses cases where I run an 8B not-so-smart model to process oodles of data on my local 7900xtx or similar. Got older m2 mbp with 96gb (v)ram and try all things local that fit. Usually LMStudio for the speed add in MLX format models on ASI (as end point; plus chat for vibes test; LMStudio omission from the OP blog post makes me question the post), or llama.cpp for GGUF (llama.cpp is the OG; excellent and universal engine and format; recently got even better). Looking at how agents work - an agent smarts of Claude Code or Codex in using the tools feels like it's half its success (the other half the underlying LLM smarts). From the training on baked in 'Tool Use & Interleaved Thinking' on the right tools in a right way, to the trivial 'DONOTDO bad idea to fill your 100K useful context with random content of multi-MB file as prompt'. The $20/mo plans are insanely competitive. OpenaI is generous with Codex, and in addition to terminal that I mostly use, there is the VSCode addon as well as use in Cline or Roo. Cursor offers in-house model fast and good, insane economy reading large codebases, as well BYOK to latest-greatest LLMs afaik. Claude Code $20/mo is stingy with quotas, but can be supplement with Z.ai standing in - glm-4.7 as of yesterday (saw no difference glm-4.6 v.v. sonnet-4.5 already v.good). It's a 3 lines change to ~/.claude/settings.json to flip Z.ai-Anthropic back and forth at will (e.g. when paused on one to switch to the other). Have not tried the Cerebras high tok/s but wd love to - not waiting makes a ton of difference to productivity.

0 comments

ljosifov

No comments yet

Contribute on Hacker News ↗