Comment by vlovich123

2 days ago

It’s 700mib because they’re likely redistributing the CUDA libraries so that users don’t need to separately run that installer. Llama.cpp is a bit more “you are expected to know what you’re doing” on that front. But yeah, you could plausibly ship multiple versions of the inference engine although from a maintenance perspective that sounds like hell for any number of reasons

0 comments