Comment by dnhkng

4 months ago

I stick with models I can run on VRAM, but DeepSeek Speciale have the best reasoning capabilities of the models I can actually run (https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Speciale). What hardware can you access?

I have Deepseek etc, but inferencing on DDR5 would take about 2-3 weeks for a simple scan. I think this works best with dense models, but it also seems ok with MoE.

@everyone: Can someone hook me up with Nvidia sponsorship?

1 comment

dnhkng

hashmap 4 months ago

oh neat ill check that one out. i dont get that much speedup from ssd/128gb unified vs vram if im doing like a predefined set of prompts, since i have it load it from disk anyway and im just doing one forward pass per prompt, and just like load part of it at a time. its a bit slower if im doing cpu inferencing but i only had to do that with one model so far.

but yeah on demand would be a lot of ssd churn so id just do it for testing or getting some hidden state vectors.