Comment by MarkSweep

6 months ago

Sorry, I forgot to include in my comment this part:

If you include multiple version of llama, and each of those llama version depends on different GPU libraries, that could balloon the download size.

If these GPU libraries change rarely, then yes, you are correct, it might not be a problem.

Well llama.cpp requires minimum CUDA 11 from 2020, or if you need CUDA C++17 support, then CUDA 12 from 2022.

It'll compile fine against latest CUDA 12.8 or 12.9 etc, but there's zero need to pack whatever the latest CUDA version is.