And that's fine, but the average person asking is already willing to give up some raw intelligence going local, and would not expect the kind of abysmal performance you're likely getting after describing it as "fast".
I setup Deepseek bs=1 on a $41,000 GH200 and got double digit prompt processing speeds (~50 tk/s): you're definitely getting worse performance than the GH200 was, and that's already unacceptable for most users.
They'd be much better served spending less money than you had to spend and getting an actually interactive experience, instead of having to send off prompts and wait several minutes to get an actual reply the moment the query involves any actual context.
And that's fine, but the average person asking is already willing to give up some raw intelligence going local, and would not expect the kind of abysmal performance you're likely getting after describing it as "fast".
I setup Deepseek bs=1 on a $41,000 GH200 and got double digit prompt processing speeds (~50 tk/s): you're definitely getting worse performance than the GH200 was, and that's already unacceptable for most users.
They'd be much better served spending less money than you had to spend and getting an actually interactive experience, instead of having to send off prompts and wait several minutes to get an actual reply the moment the query involves any actual context.