Comment by kamranjon

2 hours ago

I'm really happy this is one of the top comments here, I am fully local as well.

Just wanted to leave a note for folks who might not have the memory to run a big 32gb model - I just found out there are some pruned models that have really good performance and If I had a smaller machine I might try this pruned unsloth Q4 quant of GLM 4.7 flash that sits at 14gb: https://huggingface.co/unsloth/GLM-4.7-Flash-REAP-23B-A3B-GG...

I usually use LM Studio for this type of thing but unsloth has their own studio type app that might be even better suited for these quants.

I used GLM 4.7 flash as my main model for months and it was an incredibly tenacious model and very very fast - I think on restricted hardware, this could be a great choice.