Comment by senko
6 hours ago
Usually just once (and I did just one test for this particular one), but I've found the overall quality to be relatively consistent.
There's too many confounding variables here, randomness just one of them. So I don't think of it as a definitive test (and reliable ordering), just another data point (along with actual benchmarks, pelicans, etc) to get a sense of the capabilities.
For example, I managed to get something out of DeepSeek 4 Flash quantized to 2-bit with Antirez' DwarfStar, used via Pi. Almost kinda worked! :) Which makes me optimistic for using local models for serious development soon - I'd say within a year.
No comments yet
Contribute on Hacker News ↗