Comment by ddxv

1 day ago

I really hope at some point in the near future AI models shrink enough or laptops get strong enough to run AI models locally. I haven't tried in the past year, but when I did it was very slow token output + laptop was on fire to make that happen.

I've wanted to try some of the more recent 8B models for local tab completion or agentic, any experience with those kinds of smaller models?

4 comments

ddxv

lioeters 1 day ago

I've been running local language models on an existing laptop with 8GB GPU, currently using ministral-3:8b. It's faster than other models of similar size I used previously, fast enough that I never wait for it, rather have to scroll back to read the full output.

So far I'm using it conversationally, and scripting with tools. I wrote a simple chat interface / REPL in the terminal. But it's not integrated with code editor, nor agentic/claw-like loops. Last time I tried an open-source Codex-like thing, a popular one but I forget its name, it was slow and not that useful for my coding style.

It took some practice but I've been able to get good use out of it, for learning languages (human and programming), translation, producing code examples and snippets, and sometimes bouncing ideas like a rubber-duck method.

segmondy 1 day ago

qwen3-8b is good and if you are doing tab completion then it's more than adequate. you can get basic agentic with it, but if you really want to use a serious agent and do some serious work, then at the very least qwen3.5-27B if you have a 5090 32gb vram GPU or qwen3.5-35-a3b if you have less than 24gb. if you want to use a laptop, get a laptop with a built in gpu or igpu.

ZYZ64738 1 day ago

> NTransformer High-efficiency C++/CUDA LLM inference engine. Runs Llama 70B on a single RTX 3090 (24GB VRAM) by streaming model layers through GPU memory via PCIe, with optional NVMe direct I/O that bypasses the CPU entirely.

untested:

https://github.com/xaskasdf/ntransformer

setopt 1 day ago

I had some luck with Ollama + Mistral Nemo models on consumer hardware, it seemed to punch above its "weight class". But it’s still far enough behind ChatGPT et al. that I couldn’t stop using that for real work.